From patchwork Wed Sep 19 08:47:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Janosch Frank X-Patchwork-Id: 10605519 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C09A65A4 for ; Wed, 19 Sep 2018 08:49:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD56B2B5A1 for ; Wed, 19 Sep 2018 08:49:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A1C332B5AC; Wed, 19 Sep 2018 08:49:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FF2B2B5A1 for ; Wed, 19 Sep 2018 08:49:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731187AbeISO0Q (ORCPT ); Wed, 19 Sep 2018 10:26:16 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:49482 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727990AbeISO0Q (ORCPT ); Wed, 19 Sep 2018 10:26:16 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w8J8ix30105264 for ; Wed, 19 Sep 2018 04:49:20 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 2mkgjwphkq-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 19 Sep 2018 04:49:19 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 19 Sep 2018 09:49:17 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 19 Sep 2018 09:49:14 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w8J8nDa350921630 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 19 Sep 2018 08:49:13 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 88D36AE04D; Wed, 19 Sep 2018 11:48:20 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C4F04AE055; Wed, 19 Sep 2018 11:48:19 +0100 (BST) Received: from s38lp20.boeblingen.de.ibm.com (unknown [9.145.184.145]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 19 Sep 2018 11:48:19 +0100 (BST) From: Janosch Frank To: kvm@vger.kernel.org Cc: linux-s390@vger.kernel.org, david@redhat.com, borntraeger@de.ibm.com, schwidefsky@de.ibm.com Subject: [RFC 11/14] s390/mm: Add gmap shadowing for large pmds Date: Wed, 19 Sep 2018 10:47:59 +0200 X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180919084802.183381-1-frankja@linux.ibm.com> References: <20180919084802.183381-1-frankja@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18091908-0012-0000-0000-000002AAE2E1 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18091908-0013-0000-0000-000020DF3FEE Message-Id: <20180919084802.183381-12-frankja@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-09-19_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=4 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=957 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809190091 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Up to now we could only shadow large pmds when the parent's mapping was done with normal sized pmds. This is done by introducing fake page tables and effectively running the level 3 guest with a standard memory backing instead of the large one. With this patch we add shadowing when the host is large page backed. This allows us to run normal and large backed VMs inside a large backed host. Signed-off-by: Janosch Frank --- arch/s390/include/asm/gmap.h | 9 +- arch/s390/kvm/gaccess.c | 52 +++++-- arch/s390/mm/gmap.c | 327 +++++++++++++++++++++++++++++++++++-------- 3 files changed, 316 insertions(+), 72 deletions(-) diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h index c667bd0181d4..3df7a004e6e5 100644 --- a/arch/s390/include/asm/gmap.h +++ b/arch/s390/include/asm/gmap.h @@ -16,11 +16,12 @@ /* Status bits only for huge segment entries */ #define _SEGMENT_ENTRY_GMAP_IN 0x8000 /* invalidation notify bit */ #define _SEGMENT_ENTRY_GMAP_UC 0x4000 /* dirty (migration) */ +#define _SEGMENT_ENTRY_GMAP_VSIE 0x2000 /* vsie bit */ /* Status bits in the gmap segment entry. */ #define _SEGMENT_ENTRY_GMAP_SPLIT 0x0001 /* split huge pmd */ #define GMAP_SEGMENT_STATUS_BITS (_SEGMENT_ENTRY_GMAP_UC | _SEGMENT_ENTRY_GMAP_SPLIT) -#define GMAP_SEGMENT_NOTIFY_BITS _SEGMENT_ENTRY_GMAP_IN +#define GMAP_SEGMENT_NOTIFY_BITS (_SEGMENT_ENTRY_GMAP_IN | _SEGMENT_ENTRY_GMAP_VSIE) /** * struct gmap_struct - guest address space @@ -146,9 +147,11 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt, int fake); int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt, int fake); -int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, - unsigned long *pgt, int *dat_protection, int *fake); +int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr, + unsigned long *pgt, int *dat_protection, + int *fake, int *lvl); int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte); +int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd); void gmap_register_pte_notifier(struct gmap_notifier *); void gmap_unregister_pte_notifier(struct gmap_notifier *); diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 07d30ffcfa41..0b4cde3e431e 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -981,7 +981,7 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra) */ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, unsigned long *pgt, int *dat_protection, - int *fake) + int *fake, int *lvl) { struct gmap *parent; union asce asce; @@ -1130,14 +1130,25 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, if (ste.cs && asce.p) return PGM_TRANSLATION_SPEC; *dat_protection |= ste.fc0.p; + + /* Guest is huge page mapped */ if (ste.fc && sg->edat_level >= 1) { - *fake = 1; - ptr = ste.fc1.sfaa * _SEGMENT_SIZE; - ste.val = ptr; - goto shadow_pgt; + /* 4k to 1m, we absolutely need fake shadow tables. */ + if (!parent->mm->context.allow_gmap_hpage_1m) { + *fake = 1; + ptr = ste.fc1.sfaa * _SEGMENT_SIZE; + ste.val = ptr; + goto shadow_pgt; + } else { + *lvl = 1; + *pgt = ptr; + return 0; + + } } ptr = ste.fc0.pto * (PAGE_SIZE / 2); shadow_pgt: + *lvl = 0; ste.fc0.p |= *dat_protection; rc = gmap_shadow_pgt(sg, saddr, ste.val, *fake); if (rc) @@ -1166,8 +1177,9 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, { union vaddress vaddr; union page_table_entry pte; + union segment_table_entry ste; unsigned long pgt; - int dat_protection, fake; + int dat_protection, fake, lvl = 0; int rc; down_read(&sg->mm->mmap_sem); @@ -1178,12 +1190,35 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, */ ipte_lock(vcpu); - rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake); + rc = gmap_shadow_sgt_lookup(sg, saddr, &pgt, &dat_protection, &fake, &lvl); if (rc) rc = kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection, - &fake); + &fake, &lvl); vaddr.addr = saddr; + + /* Shadow stopped at segment level, we map pmd to pmd */ + if (!rc && lvl) { + rc = gmap_read_table(sg->parent, pgt + vaddr.sx * 8, &ste.val); + if (!rc && ste.i) + rc = PGM_PAGE_TRANSLATION; + ste.fc1.p |= dat_protection; + if (!rc) + rc = gmap_shadow_segment(sg, saddr, __pmd(ste.val)); + if (rc == -EISDIR) { + /* Hit a split pmd, we need to setup a fake page table */ + fake = 1; + pgt = ste.fc1.sfaa * _SEGMENT_SIZE; + ste.val = pgt; + rc = gmap_shadow_pgt(sg, saddr, ste.val, fake); + if (rc) + goto out; + } else { + /* We're done */ + goto out; + } + } + if (fake) { pte.val = pgt + vaddr.px * PAGE_SIZE; goto shadow_page; @@ -1198,6 +1233,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, pte.p |= dat_protection; if (!rc) rc = gmap_shadow_page(sg, saddr, __pte(pte.val)); +out: ipte_unlock(vcpu); up_read(&sg->mm->mmap_sem); return rc; diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index c64f9a48f5f8..f697c73afba3 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -879,28 +879,6 @@ static inline unsigned long *gmap_table_walk(struct gmap *gmap, return table; } -/** - * gmap_pte_op_walk - walk the gmap page table, get the page table lock - * and return the pte pointer - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @ptl: pointer to the spinlock pointer - * - * Returns a pointer to the locked pte for a guest address, or NULL - */ -static pte_t *gmap_pte_op_walk(struct gmap *gmap, unsigned long gaddr, - spinlock_t **ptl) -{ - unsigned long *table; - - BUG_ON(gmap_is_shadow(gmap)); - /* Walk the gmap page table, lock and get pte pointer */ - table = gmap_table_walk(gmap, gaddr, 1); /* get segment pointer */ - if (!table || *table & _SEGMENT_ENTRY_INVALID) - return NULL; - return pte_alloc_map_lock(gmap->mm, (pmd_t *) table, gaddr, ptl); -} - /** * gmap_fixup - force memory in and connect the gmap table entry * @gmap: pointer to guest mapping meta data structure @@ -1454,6 +1432,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr, } #define _SHADOW_RMAP_MASK 0x7 +#define _SHADOW_RMAP_SEGMENT_LP 0x6 #define _SHADOW_RMAP_REGION1 0x5 #define _SHADOW_RMAP_REGION2 0x4 #define _SHADOW_RMAP_REGION3 0x3 @@ -1559,15 +1538,18 @@ static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr, BUG_ON(!gmap_is_shadow(sg)); for (i = 0; i < _CRST_ENTRIES; i++, raddr += _SEGMENT_SIZE) { - if (!(sgt[i] & _SEGMENT_ENTRY_ORIGIN)) + if (sgt[i] == _SEGMENT_ENTRY_EMPTY) continue; - pgt = (unsigned long *)(sgt[i] & _REGION_ENTRY_ORIGIN); + + if (!(sgt[i] & _SEGMENT_ENTRY_LARGE)) { + pgt = (unsigned long *)(sgt[i] & _SEGMENT_ENTRY_ORIGIN); + __gmap_unshadow_pgt(sg, raddr, pgt); + /* Free page table */ + page = pfn_to_page(__pa(pgt) >> PAGE_SHIFT); + list_del(&page->lru); + page_table_free_pgste(page); + } sgt[i] = _SEGMENT_ENTRY_EMPTY; - __gmap_unshadow_pgt(sg, raddr, pgt); - /* Free page table */ - page = pfn_to_page(__pa(pgt) >> PAGE_SHIFT); - list_del(&page->lru); - page_table_free_pgste(page); } } @@ -2173,7 +2155,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_sgt); /** * gmap_shadow_lookup_pgtable - find a shadow page table * @sg: pointer to the shadow guest address space structure - * @saddr: the address in the shadow aguest address space + * @saddr: the address in the shadow guest address space * @pgt: parent gmap address of the page table to get shadowed * @dat_protection: if the pgtable is marked as protected by dat * @fake: pgt references contiguous guest memory block, not a pgtable @@ -2183,32 +2165,64 @@ EXPORT_SYMBOL_GPL(gmap_shadow_sgt); * * Called with sg->mm->mmap_sem in read. */ -int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, - unsigned long *pgt, int *dat_protection, - int *fake) +void gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long *sge, + unsigned long saddr, unsigned long *pgt, + int *dat_protection, int *fake) { - unsigned long *table; struct page *page; - int rc; + + /* Shadow page tables are full pages (pte+pgste) */ + page = pfn_to_page(*sge >> PAGE_SHIFT); + *pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE; + *dat_protection = !!(*sge & _SEGMENT_ENTRY_PROTECT); + *fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE); +} +EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup); + +int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr, + unsigned long *pgt, int *dat_protection, + int *fake, int *lvl) +{ + unsigned long *sge, *r3e = NULL; + struct page *page; + int rc = -EAGAIN; BUG_ON(!gmap_is_shadow(sg)); spin_lock(&sg->guest_table_lock); - table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */ - if (table && !(*table & _SEGMENT_ENTRY_INVALID)) { - /* Shadow page tables are full pages (pte+pgste) */ - page = pfn_to_page(*table >> PAGE_SHIFT); - *pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE; - *dat_protection = !!(*table & _SEGMENT_ENTRY_PROTECT); - *fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE); - rc = 0; - } else { - rc = -EAGAIN; + if (sg->asce & _ASCE_TYPE_MASK) { + /* >2 GB guest */ + r3e = (unsigned long *) gmap_table_walk(sg, saddr, 2); + if (!r3e || (*r3e & _REGION_ENTRY_INVALID)) + goto out; + sge = (unsigned long *)(*r3e & _REGION_ENTRY_ORIGIN) + ((saddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT); + } else { + sge = (unsigned long *)(sg->asce & PAGE_MASK) + ((saddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT); } + if (*sge & _SEGMENT_ENTRY_INVALID) + goto out; + rc = 0; + if (*sge & _SEGMENT_ENTRY_LARGE) { + if (r3e) { + page = pfn_to_page(*r3e >> PAGE_SHIFT); + *pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE; + *dat_protection = !!(*r3e & _SEGMENT_ENTRY_PROTECT); + *fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE); + } else { + *pgt = sg->orig_asce & PAGE_MASK; + *dat_protection = 0; + *fake = 0; + } + *lvl = 1; + } else { + gmap_shadow_pgt_lookup(sg, sge, saddr, pgt, + dat_protection, fake); + *lvl = 0; + } +out: spin_unlock(&sg->guest_table_lock); return rc; - } -EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup); +EXPORT_SYMBOL_GPL(gmap_shadow_sgt_lookup); /** * gmap_shadow_pgt - instantiate a shadow page table @@ -2290,6 +2304,94 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt, } EXPORT_SYMBOL_GPL(gmap_shadow_pgt); +int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd) +{ + struct gmap *parent; + struct gmap_rmap *rmap; + unsigned long vmaddr, paddr; + spinlock_t *ptl = NULL; + pmd_t spmd, tpmd, *spmdp = NULL, *tpmdp; + int prot; + int rc; + + BUG_ON(!gmap_is_shadow(sg)); + parent = sg->parent; + + prot = (pmd_val(pmd) & _SEGMENT_ENTRY_PROTECT) ? PROT_READ : PROT_WRITE; + rmap = kzalloc(sizeof(*rmap), GFP_KERNEL); + if (!rmap) + return -ENOMEM; + rmap->raddr = (saddr & HPAGE_MASK) | _SHADOW_RMAP_SEGMENT_LP; + + while (1) { + paddr = pmd_val(pmd) & HPAGE_MASK; + vmaddr = __gmap_translate(parent, paddr); + if (IS_ERR_VALUE(vmaddr)) { + rc = vmaddr; + break; + } + rc = radix_tree_preload(GFP_KERNEL); + if (rc) + break; + rc = -EAGAIN; + + /* Let's look up the parent's mapping */ + spmdp = gmap_pmd_op_walk(parent, paddr, vmaddr, &ptl); + if (spmdp) { + if (gmap_pmd_is_split(spmdp)) { + gmap_pmd_op_end(ptl); + radix_tree_preload_end(); + rc = -EISDIR; + break; + } + spin_lock(&sg->guest_table_lock); + /* Get shadow segment table pointer */ + tpmdp = (pmd_t *) gmap_table_walk(sg, saddr, 1); + if (!tpmdp) { + spin_unlock(&sg->guest_table_lock); + gmap_pmd_op_end(ptl); + radix_tree_preload_end(); + break; + } + /* Shadowing magic happens here. */ + if (!(pmd_val(*tpmdp) & _SEGMENT_ENTRY_INVALID)) { + rc = 0; /* already shadowed */ + spin_unlock(&sg->guest_table_lock); + gmap_pmd_op_end(ptl); + radix_tree_preload_end(); + break; + } + spmd = *spmdp; + if (!(pmd_val(spmd) & _SEGMENT_ENTRY_INVALID) && + !((pmd_val(spmd) & _SEGMENT_ENTRY_PROTECT) && + !(pmd_val(pmd) & _SEGMENT_ENTRY_PROTECT))) { + + pmd_val(*spmdp) |= _SEGMENT_ENTRY_GMAP_VSIE; + + /* Insert shadow ste */ + pmd_val(tpmd) = ((pmd_val(spmd) & + _SEGMENT_ENTRY_HARDWARE_BITS_LARGE) | + (pmd_val(pmd) & _SEGMENT_ENTRY_PROTECT)); + *tpmdp = tpmd; + gmap_insert_rmap(sg, vmaddr, rmap); + rc = 0; + } + spin_unlock(&sg->guest_table_lock); + gmap_pmd_op_end(ptl); + } + radix_tree_preload_end(); + if (!rc) + break; + rc = gmap_fixup(parent, paddr, vmaddr, prot); + if (rc) + break; + } + if (rc) + kfree(rmap); + return rc; +} +EXPORT_SYMBOL_GPL(gmap_shadow_segment); + /** * gmap_shadow_page - create a shadow page mapping * @sg: pointer to the shadow guest address space structure @@ -2307,7 +2409,8 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte) struct gmap *parent; struct gmap_rmap *rmap; unsigned long vmaddr, paddr; - spinlock_t *ptl; + spinlock_t *ptl_pmd = NULL, *ptl_pte = NULL; + pmd_t *spmdp; pte_t *sptep, *tptep; int prot; int rc; @@ -2332,26 +2435,46 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte) if (rc) break; rc = -EAGAIN; - sptep = gmap_pte_op_walk(parent, paddr, &ptl); - if (sptep) { - spin_lock(&sg->guest_table_lock); + spmdp = gmap_pmd_op_walk(parent, paddr, vmaddr, &ptl_pmd); + if (spmdp && !(pmd_val(*spmdp) & _SEGMENT_ENTRY_INVALID)) { /* Get page table pointer */ tptep = (pte_t *) gmap_table_walk(sg, saddr, 0); if (!tptep) { - spin_unlock(&sg->guest_table_lock); - gmap_pte_op_end(ptl); radix_tree_preload_end(); + gmap_pmd_op_end(ptl_pmd); break; } - rc = ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte); - if (rc > 0) { - /* Success and a new mapping */ - gmap_insert_rmap(sg, vmaddr, rmap); - rmap = NULL; - rc = 0; + + if (pmd_large(*spmdp)) { + pte_t spte; + if (!(pmd_val(*spmdp) & _SEGMENT_ENTRY_PROTECT)) { + spin_lock(&sg->guest_table_lock); + spte = __pte((pmd_val(*spmdp) & + _SEGMENT_ENTRY_ORIGIN_LARGE) + + (pte_index(paddr) << 12)); + ptep_shadow_set(spte, tptep, pte); + pmd_val(*spmdp) |= _SEGMENT_ENTRY_GMAP_VSIE; + gmap_insert_rmap(sg, vmaddr, rmap); + rmap = NULL; + rc = 0; + spin_unlock(&sg->guest_table_lock); + } + } else { + sptep = gmap_pte_from_pmd(parent, spmdp, paddr, &ptl_pte); + spin_lock(&sg->guest_table_lock); + if (sptep) { + rc = ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte); + if (rc > 0) { + /* Success and a new mapping */ + gmap_insert_rmap(sg, vmaddr, rmap); + rmap = NULL; + rc = 0; + } + spin_unlock(&sg->guest_table_lock); + gmap_pte_op_end(ptl_pte); + } } - gmap_pte_op_end(ptl); - spin_unlock(&sg->guest_table_lock); + gmap_pmd_op_end(ptl_pmd); } radix_tree_preload_end(); if (!rc) @@ -2365,6 +2488,75 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte) } EXPORT_SYMBOL_GPL(gmap_shadow_page); +/** + * gmap_unshadow_segment - remove a huge segment from a shadow segment table + * @sg: pointer to the shadow guest address space structure + * @raddr: rmap address in the shadow guest address space + * + * Called with the sg->guest_table_lock + */ +static void gmap_unshadow_segment(struct gmap *sg, unsigned long raddr) +{ + unsigned long *table; + + BUG_ON(!gmap_is_shadow(sg)); + /* We already have the lock */ + table = gmap_table_walk(sg, raddr, 1); /* get segment table pointer */ + if (!table || *table & _SEGMENT_ENTRY_INVALID || + !(*table & _SEGMENT_ENTRY_LARGE)) + return; + gmap_call_notifier(sg, raddr, raddr + HPAGE_SIZE - 1); + gmap_idte_global(sg->asce, (pmd_t *)table, raddr); + *table = _SEGMENT_ENTRY_EMPTY; +} + +static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr, + unsigned long gaddr) +{ + struct gmap_rmap *rmap, *rnext, *head; + unsigned long start, end, bits, raddr; + + + BUG_ON(!gmap_is_shadow(sg)); + + spin_lock(&sg->guest_table_lock); + if (sg->removed) { + spin_unlock(&sg->guest_table_lock); + return; + } + /* Check for top level table */ + start = sg->orig_asce & _ASCE_ORIGIN; + end = start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE; + if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >= start && + gaddr < ((end & HPAGE_MASK) + HPAGE_SIZE - 1)) { + /* The complete shadow table has to go */ + gmap_unshadow(sg); + spin_unlock(&sg->guest_table_lock); + list_del(&sg->list); + gmap_put(sg); + return; + } + /* Remove the page table tree from on specific entry */ + head = radix_tree_delete(&sg->host_to_rmap, (vmaddr & HPAGE_MASK) >> PAGE_SHIFT); + gmap_for_each_rmap_safe(rmap, rnext, head) { + bits = rmap->raddr & _SHADOW_RMAP_MASK; + raddr = rmap->raddr ^ bits; + switch (bits) { + case _SHADOW_RMAP_SEGMENT_LP: + gmap_unshadow_segment(sg, raddr); + break; + case _SHADOW_RMAP_PGTABLE: + gmap_unshadow_page(sg, raddr); + break; + default: + BUG(); + } + kfree(rmap); + } + spin_unlock(&sg->guest_table_lock); +} + + /** * gmap_shadow_notify - handle notifications for shadow gmap * @@ -2416,6 +2608,8 @@ static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr, case _SHADOW_RMAP_PGTABLE: gmap_unshadow_page(sg, raddr); break; + default: + BUG(); } kfree(rmap); } @@ -2499,10 +2693,21 @@ static inline void pmdp_notify_split(struct gmap *gmap, pmd_t *pmdp, static void pmdp_notify_gmap(struct gmap *gmap, pmd_t *pmdp, unsigned long gaddr, unsigned long vmaddr) { + struct gmap *sg, *next; + BUG_ON((gaddr & ~HPAGE_MASK) || (vmaddr & ~HPAGE_MASK)); if (gmap_pmd_is_split(pmdp)) return pmdp_notify_split(gmap, pmdp, gaddr, vmaddr); + if (!list_empty(&gmap->children) && + (pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_VSIE)) { + spin_lock(&gmap->shadow_lock); + list_for_each_entry_safe(sg, next, &gmap->children, list) + gmap_shadow_notify_pmd(sg, vmaddr, gaddr); + spin_unlock(&gmap->shadow_lock); + } + pmd_val(*pmdp) &= ~_SEGMENT_ENTRY_GMAP_VSIE; + if (!(pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_IN)) return; pmd_val(*pmdp) &= ~_SEGMENT_ENTRY_GMAP_IN;