[v10,09/24] xsplice: Implement support for applying/reverting/replacing patches.

From: Ross Lagerwall <ross.lagerwall@citrix.com>

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Implement support for the apply, revert and replace actions.

To perform and action on a payload, the hypercall sets up a data
structure to schedule the work.  A hook is added in the reset_stack_and_jump
to check for work and execute it if needed (specifically we check an
per-cpu flag to make this as quick as possible).

In this way, patches can be applied with all CPUs idle and without
stacks.  The first CPU to run check_for_xsplice_work() becomes the
master and triggers a reschedule softirq to trigger all the other CPUs
to enter check_for_xsplice_work() with no stack.  Once all CPUs
have rendezvoused, all CPUs disable their IRQs and NMIs are ignored.
The system is then quiscient and the master performs the action.
After this, all CPUs enable IRQs and NMIs are re-enabled.

Note that it is unsafe to patch do_nmi and the xSplice internal functions.
Patching functions on NMI/MCE path is liable to end in disaster on x86.
This is not addressed in this patch and is mentioned in the
design doc as a further TODO.

The action to perform is one of:
- APPLY: For each function in the module, store the first arch-specific
  number bytes of the old function and replace it with a jump to the
  new function. (on x86 it is 5 bytes, on ARM it will likey be 4 bytes).
- REVERT: Copy the previously stored bytes into the first arch-specific
  number of bytes of the old function (again, 5 bytes on x86).
- REPLACE: Revert each applied module and then apply the new module.

To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.
Measurements found that the patch application to take about 100 ?s on a
72 CPU system, whether idle or fully loaded.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

--
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v2: - Pluck the 'struct xsplice_patch_func' in this patch.
    - Modify code per review comments.
    - Add more data in the keyboard handler.
    - Redo the patching code, split it in functions.
v3: - Add return_ macro for debug builds.
    - Move s/payload_list_lock/payload_list/ to earlier patch
    - Remove const and use ELF types for xsplice_patch_func
     - Add check routine to do simple sanity checks for various
      sections.
    - s/%p/PRIx64/ as ARM builds complain.
    - Move code around. Add more dprintk. Add XSPLICE in front of all
      printks/dprintk.
      Put the NMIs back if we fail patching.
      Add per-cpu to lessen contention for global structure.
      Extract from xsplice_do_single patching code into xsplice_do_action
      Squash xsplice_do_single and check_for_xsplice_work together to
      have all rendezvous in one place.
      Made XSPLICE_ACTION_REPLACE work again (wrong list iterator)
      s/find_special_sections/prepare_payload/
      Use list_del_init and INIT_LIST_HEAD for applied_list
v4:
   - Add comment, adjust spacing for "Timed out on CPU semaphore"
   - Added CR0.WP manipulations when altering the .text of hypervisor.
   - Added fix from Andrew for CR0.WP manipulation.
v5: - Made xsplice_patch_func use uintXX_t instead of ELF_ types to easy
      making it work under ARM (32bit). Add more BUILD-BUG-ON checks.
    - Add more BUILD_ON checks. Sprinkle newlines.
v6: Rebase on "arm/x86: Alter nmi_callback_t typedef"
   - Drop the recursive spinlock usage.
   - Move NMI callbacks in arch specific.
   - Fold the 'check_for_xsplice_work' in reset_stack_and_jump
   - Add arch specific check for .xsplice.funcs.
   - Seperate external and internal structure of .xsplice.funcs.
   - Changed per Jan's review
   - Modified the .xsplice.funcs checks
v7:
   - Modified old_ptr to void* instead of uint8_t*
   - Modified the xsplice_patch_func_internal for ARM32 to have padding.
   - Used #if BITS_PER_LONG == 64 for the xsplice_patch_func_internal along
     with ifndef CONFIG_ARM for the undo (which may be different size on ARM64)
v8:
  - Add "is empty" if special sections are in fact empty.
  - Added Andrew's Reviewed-by:
  - Rebase on v7.2 of  x86/mm: Introduce modify_xen_mappings()
  - Change some of printk to dprintk and some of the dprintk to printk.
  - Make the xsplice_patch_func (and the internal) structure have uint32_t
    (instead of uint64_t) if BITS_PER_LONG==32. This makes the size and
    offset different so note that in the design and common code.
  - Add #undef ACTION
  - Guard struct xsplice_patch_func in sysctl.h with __XEN__ as toolstacks
    will fail to compile. We do have BITS_PER_LONG defined in xc_bitops.h but
    that will go away (and also that macro uses sizeof and the pre-processor
    will choke on that).
  - Dropped Julien's Acked as I replaced BITS_PER_LONG/CONFIG_ARM_32.
    (Stefano is OK with it, but would prefer BITS_PER_LONG, Jan does not want
    BITS_PER_LONG).
v9: Expose the struct xsplice_patch_func old_addr and new_addr as void
    instead of uint32_t or uint64_t.
  - Added Julien' Ack back.
  - Rename pad to opaque.
  - Added comment in aidle_loop.
  - Squash internal and public of 'xsplice_patch_func'
  - Fixed remainig sizeof use.
  - Removed reference to MCE
  - Fixed comment styles.
  - Use bool_t in check_special_sections
  - Add a #define for .xsplice.funcs.
  - Remove full stops from printk
  - Fix xsplice_do_action per Jan's punchlist
  - Use spin_lock_try in keyhandler
  - Remove leading underscores from __CHECK_FOR_XSPLICE_WORK
  - Don't fail compilation on GCC5 - we MUST have rc set.
  - Don't bail out if finding !sh_type as those are for .rela or .debug
    and while we don't need to allocate it (as we had already done
    the relocation), do continue.
  - Make applied_list be an RCU type to guard against infinite loops
    when searching the applied_list.
  - Dropped the irq_semaphore and are re-using the semaphore atomic when
    CPUs have rendezvoused and are ready to go in IRQ disable phase.
v10: Drop Reviewed-by
  - Use bitmap for in check_special_sections to check for sections
    appearing twice.
  - Add comment about us abusing the list RCU for our safety reasons.
  - And remove MUST comment about opaque having to be zero filled.
v10 - patch inline in response to v9 patchset.
  - Make the bitmap in check_special_sections be called found and use
    __test_and_Set_bit.
  - Add Jan's Acked-by
---
 xen/arch/arm/xsplice.c        |  33 +++
 xen/arch/x86/domain.c         |   6 +
 xen/arch/x86/xsplice.c        |  76 +++++++
 xen/common/xsplice.c          | 478 +++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/current.h |  10 +-
 xen/include/public/sysctl.h   |  20 ++
 xen/include/xen/xsplice.h     |  21 ++
 7 files changed, 632 insertions(+), 12 deletions(-)

[v10,09/24] xsplice: Implement support for applying/reverting/replacing patches.

Commit Message

Patch