From patchwork Thu Feb 4 07:00:07 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Huaitong Han X-Patchwork-Id: 8214131 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id E859EBEEE5 for ; Thu, 4 Feb 2016 07:05:07 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 93A402017D for ; Thu, 4 Feb 2016 07:05:06 +0000 (UTC) Received: from lists.xen.org (lists.xenproject.org [50.57.142.19]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 374B5200FF for ; Thu, 4 Feb 2016 07:05:05 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aRDtg-0000u8-PY; Thu, 04 Feb 2016 07:00:08 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aRDte-0000tn-KI for xen-devel@lists.xen.org; Thu, 04 Feb 2016 07:00:06 +0000 Received: from [85.158.139.211] by server-3.bemta-5.messagelabs.com id A7/60-13487-5F6F2B65; Thu, 04 Feb 2016 07:00:05 +0000 X-Env-Sender: huaitong.han@intel.com X-Msg-Ref: server-4.tower-206.messagelabs.com!1454569203!20206117!1 X-Originating-IP: [134.134.136.65] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 7.35.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 49326 invoked from network); 4 Feb 2016 07:00:04 -0000 Received: from mga03.intel.com (HELO mga03.intel.com) (134.134.136.65) by server-4.tower-206.messagelabs.com with SMTP; 4 Feb 2016 07:00:04 -0000 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP; 03 Feb 2016 23:00:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,393,1449561600"; d="scan'208";a="876696111" Received: from huaitong-desk.bj.intel.com ([10.238.135.154]) by orsmga001.jf.intel.com with ESMTP; 03 Feb 2016 23:00:01 -0800 From: Huaitong Han To: jbeulich@suse.com, andrew.cooper3@citrix.com, george.dunlap@eu.citrix.com, keir@xen.org Date: Thu, 4 Feb 2016 15:00:07 +0800 Message-Id: <1454569207-25134-1-git-send-email-huaitong.han@intel.com> X-Mailer: git-send-email 2.4.3 MIME-Version: 1.0 Cc: Huaitong Han , xen-devel@lists.xen.org Subject: [Xen-devel] [PATCH V10 2/5] x86/hvm: pkeys, add pkeys support for guest_walk_tables X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of leaf entries of the page tables. PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per domain in pkru, for each i (0 ? i ? 15), PKRU[2i] is the access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key i (WDi). PKEY is index to a defined domain. A fault is considered as a PKU violation if all of the following conditions are true: 1.CR4_PKE=1. 2.EFER_LMA=1. 3.Page is present with no reserved bit violations. 4.The access is not an instruction fetch. 5.The access is to a user page. 6.PKRU.AD=1 or The access is a data write and PKRU.WD=1 and either CR0.WP=1 or it is a user access. Signed-off-by: Huaitong Han Reviewed-by: Jan Beulich Reviewed-by: Kevin Tian --- Changes in v10: *Move PFEC_page_present check. Changes in v9: *Rename _write_cr4 to raw_write_cr4. Changes in v8: *Abstract out _write_cr4. Changes in v7: *Add static for pkey_fault. *Add a comment for page present check and adjust indentation. *Init pkru_ad and pkru_wd. *Delete l3e_get_pkey the outer parentheses. *The first parameter of read_pkru_* use uint32_t type. xen/arch/x86/mm/guest_walk.c | 53 +++++++++++++++++++++++++++++++++++++++ xen/arch/x86/mm/hap/guest_walk.c | 3 +++ xen/include/asm-x86/guest_pt.h | 12 +++++++++ xen/include/asm-x86/hvm/hvm.h | 2 ++ xen/include/asm-x86/page.h | 5 ++++ xen/include/asm-x86/processor.h | 47 +++++++++++++++++++++++++++++++++- xen/include/asm-x86/x86_64/page.h | 12 +++++++++ 7 files changed, 133 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c index 18d1acf..4a6d292 100644 --- a/xen/arch/x86/mm/guest_walk.c +++ b/xen/arch/x86/mm/guest_walk.c @@ -90,6 +90,53 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty) return 0; } +#if GUEST_PAGING_LEVELS >= 4 +static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec, + uint32_t pte_flags, uint32_t pte_pkey) +{ + uint32_t pkru = 0; + bool_t pkru_ad = 0, pkru_wd = 0; + + if ( is_pv_vcpu(vcpu) ) + return 0; + + /* + * PKU: additional mechanism by which the paging controls + * access to user-mode addresses based on the value in the + * PKRU register. A fault is considered as a PKU violation if all + * of the following conditions are true: + * 1.CR4_PKE=1. + * 2.EFER_LMA=1. + * 3.Page is present with no reserved bit violations. + * 4.The access is not an instruction fetch. + * 5.The access is to a user page. + * 6.PKRU.AD=1 or + * the access is a data write and PKRU.WD=1 and + * either CR0.WP=1 or it is a user access. + */ + if ( !hvm_pku_enabled(vcpu) || + !hvm_long_mode_enabled(vcpu) || + !(pfec & PFEC_page_present) || + (pfec & PFEC_reserved_bit) || + (pfec & PFEC_insn_fetch) || + !(pte_flags & _PAGE_USER) ) + return 0; + + pkru = read_pkru(); + if ( unlikely(pkru) ) + { + pkru_ad = read_pkru_ad(pkru, pte_pkey); + pkru_wd = read_pkru_wd(pkru, pte_pkey); + /* Condition 6 */ + if ( pkru_ad || (pkru_wd && (pfec & PFEC_write_access) && + (hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode)))) + return 1; + } + + return 0; +} +#endif + /* Walk the guest pagetables, after the manner of a hardware walker. */ /* Because the walk is essentially random, it can cause a deadlock * warning in the p2m locking code. Highly unlikely this is an actual @@ -107,6 +154,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, guest_l3e_t *l3p = NULL; guest_l4e_t *l4p; #endif + unsigned int pkey; uint32_t gflags, mflags, iflags, rc = 0; bool_t smep = 0, smap = 0; bool_t pse1G = 0, pse2M = 0; @@ -190,6 +238,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, goto out; /* Get the l3e and check its flags*/ gw->l3e = l3p[guest_l3_table_offset(va)]; + pkey = guest_l3e_get_pkey(gw->l3e); gflags = guest_l3e_get_flags(gw->l3e) ^ iflags; if ( !(gflags & _PAGE_PRESENT) ) { rc |= _PAGE_PRESENT; @@ -261,6 +310,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, #endif /* All levels... */ + pkey = guest_l2e_get_pkey(gw->l2e); gflags = guest_l2e_get_flags(gw->l2e) ^ iflags; if ( !(gflags & _PAGE_PRESENT) ) { rc |= _PAGE_PRESENT; @@ -324,6 +374,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, if(l1p == NULL) goto out; gw->l1e = l1p[guest_l1_table_offset(va)]; + pkey = guest_l1e_get_pkey(gw->l1e); gflags = guest_l1e_get_flags(gw->l1e) ^ iflags; if ( !(gflags & _PAGE_PRESENT) ) { rc |= _PAGE_PRESENT; @@ -334,6 +385,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */ set_ad: + if ( pkey_fault(v, pfec, gflags, pkey) ) + rc |= _PAGE_PKEY_BITS; #endif /* Now re-invert the user-mode requirement for SMEP and SMAP */ if ( smep || smap ) diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c index 11c1b35..49d0328 100644 --- a/xen/arch/x86/mm/hap/guest_walk.c +++ b/xen/arch/x86/mm/hap/guest_walk.c @@ -130,6 +130,9 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)( if ( missing & _PAGE_INVALID_BITS ) pfec[0] |= PFEC_reserved_bit; + if ( missing & _PAGE_PKEY_BITS ) + pfec[0] |= PFEC_prot_key; + if ( missing & _PAGE_PAGED ) pfec[0] = PFEC_page_paged; diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h index 3447973..eb29e62 100644 --- a/xen/include/asm-x86/guest_pt.h +++ b/xen/include/asm-x86/guest_pt.h @@ -81,6 +81,11 @@ static inline u32 guest_l1e_get_flags(guest_l1e_t gl1e) static inline u32 guest_l2e_get_flags(guest_l2e_t gl2e) { return gl2e.l2 & 0xfff; } +static inline u32 guest_l1e_get_pkey(guest_l1e_t gl1e) +{ return 0; } +static inline u32 guest_l2e_get_pkey(guest_l2e_t gl2e) +{ return 0; } + static inline guest_l1e_t guest_l1e_from_gfn(gfn_t gfn, u32 flags) { return (guest_l1e_t) { (gfn_x(gfn) << PAGE_SHIFT) | flags }; } static inline guest_l2e_t guest_l2e_from_gfn(gfn_t gfn, u32 flags) @@ -154,6 +159,13 @@ static inline u32 guest_l4e_get_flags(guest_l4e_t gl4e) { return l4e_get_flags(gl4e); } #endif +static inline u32 guest_l1e_get_pkey(guest_l1e_t gl1e) +{ return l1e_get_pkey(gl1e); } +static inline u32 guest_l2e_get_pkey(guest_l2e_t gl2e) +{ return l2e_get_pkey(gl2e); } +static inline u32 guest_l3e_get_pkey(guest_l3e_t gl3e) +{ return l3e_get_pkey(gl3e); } + static inline guest_l1e_t guest_l1e_from_gfn(gfn_t gfn, u32 flags) { return l1e_from_pfn(gfn_x(gfn), flags); } static inline guest_l2e_t guest_l2e_from_gfn(gfn_t gfn, u32 flags) diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index a87224b..731dd44 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -277,6 +277,8 @@ int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode); (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMAP)) #define hvm_nx_enabled(v) \ (!!((v)->arch.hvm_vcpu.guest_efer & EFER_NX)) +#define hvm_pku_enabled(v) \ + (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE)) /* Can we use superpages in the HAP p2m table? */ #define hap_has_1gb (!!(hvm_funcs.hap_capabilities & HVM_HAP_SUPERPAGE_1GB)) diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h index a095a93..9202f3d 100644 --- a/xen/include/asm-x86/page.h +++ b/xen/include/asm-x86/page.h @@ -93,6 +93,11 @@ #define l3e_get_flags(x) (get_pte_flags((x).l3)) #define l4e_get_flags(x) (get_pte_flags((x).l4)) +/* Get pte pkeys (unsigned int). */ +#define l1e_get_pkey(x) get_pte_pkey((x).l1) +#define l2e_get_pkey(x) get_pte_pkey((x).l2) +#define l3e_get_pkey(x) get_pte_pkey((x).l3) + /* Construct an empty pte. */ #define l1e_empty() ((l1_pgentry_t) { 0 }) #define l2e_empty() ((l2_pgentry_t) { 0 }) diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 26ba141..617a4db 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -332,6 +332,11 @@ static inline unsigned long read_cr2(void) DECLARE_PER_CPU(unsigned long, cr4); +static inline void raw_write_cr4(unsigned long val) +{ + asm volatile ( "mov %0,%%cr4" : : "r" (val) ); +} + static inline unsigned long read_cr4(void) { return this_cpu(cr4); @@ -340,7 +345,7 @@ static inline unsigned long read_cr4(void) static inline void write_cr4(unsigned long val) { this_cpu(cr4) = val; - asm volatile ( "mov %0,%%cr4" : : "r" (val) ); + raw_write_cr4(val); } /* Clear and set 'TS' bit respectively */ @@ -374,6 +379,46 @@ static always_inline void clear_in_cr4 (unsigned long mask) write_cr4(read_cr4() & ~mask); } +static inline unsigned int read_pkru(void) +{ + unsigned int pkru; + unsigned long cr4 = read_cr4(); + + /* + * _PAGE_PKEY_BITS have a conflict with _PAGE_GNTTAB used by PV guests, + * so that X86_CR4_PKE is disabled on hypervisor. To use RDPKRU, CR4.PKE + * gets temporarily enabled. + */ + raw_write_cr4(cr4 | X86_CR4_PKE); + asm volatile (".byte 0x0f,0x01,0xee" + : "=a" (pkru) : "c" (0) : "dx"); + raw_write_cr4(cr4); + + return pkru; +} + +/* Macros for PKRU domain */ +#define PKRU_READ (0) +#define PKRU_WRITE (1) +#define PKRU_ATTRS (2) + +/* + * PKRU defines 32 bits, there are 16 domains and 2 attribute bits per + * domain in pkru, pkeys is index to a defined domain, so the value of + * pte_pkeys * PKRU_ATTRS + R/W is offset of a defined domain attribute. + */ +static inline bool_t read_pkru_ad(uint32_t pkru, unsigned int pkey) +{ + ASSERT(pkey < 16); + return (pkru >> (pkey * PKRU_ATTRS + PKRU_READ)) & 1; +} + +static inline bool_t read_pkru_wd(uint32_t pkru, unsigned int pkey) +{ + ASSERT(pkey < 16); + return (pkru >> (pkey * PKRU_ATTRS + PKRU_WRITE)) & 1; +} + /* * NSC/Cyrix CPU configuration register indexes */ diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h index 19ab4d0..86abb94 100644 --- a/xen/include/asm-x86/x86_64/page.h +++ b/xen/include/asm-x86/x86_64/page.h @@ -134,6 +134,18 @@ typedef l4_pgentry_t root_pgentry_t; #define get_pte_flags(x) (((int)((x) >> 40) & ~0xFFF) | ((int)(x) & 0xFFF)) #define put_pte_flags(x) (((intpte_t)((x) & ~0xFFF) << 40) | ((x) & 0xFFF)) +/* + * Protection keys define a new 4-bit protection key field + * (PKEY) in bits 62:59 of leaf entries of the page tables. + * This corresponds to bit 22:19 of a 24-bit flags. + * + * Notice: Bit 22 is used by _PAGE_GNTTAB which is visible to PV guests, + * so Protection keys must be disabled on PV guests. + */ +#define _PAGE_PKEY_BITS (0x780000) /* Protection Keys, 22:19 */ + +#define get_pte_pkey(x) (MASK_EXTR(get_pte_flags(x), _PAGE_PKEY_BITS)) + /* Bit 23 of a 24-bit flag mask. This corresponds to bit 63 of a pte.*/ #define _PAGE_NX_BIT (1U<<23)