From patchwork Mon Feb 24 11:40:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Borntraeger X-Patchwork-Id: 11400283 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CD4E930 for ; Mon, 24 Feb 2020 11:46:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A1302084E for ; Mon, 24 Feb 2020 11:46:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A1302084E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=de.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 65C706B0005; Mon, 24 Feb 2020 06:46:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 60C2D6B0006; Mon, 24 Feb 2020 06:46:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 522B96B0007; Mon, 24 Feb 2020 06:46:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id 3B69B6B0005 for ; Mon, 24 Feb 2020 06:46:16 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 077AF248C for ; Mon, 24 Feb 2020 11:46:16 +0000 (UTC) X-FDA: 76524842352.02.wave52_70d35cb88a653 X-Spam-Summary: 2,0,0,503213c37bc1ac7a,d41d8cd98f00b204,borntraeger@de.ibm.com,,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1544:1605:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2693:2731:2897:2899:2901:2918:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4119:4250:4321:4385:4605:5007:6119:6261:6742:7576:8634:8957:9592:10004:11026:11473:11658:11914:12043:12219:12296:12297:12438:12555:12663:12679:12895:12986:13223:13229:13894:14181:14394:14721:21080:21212:21324:21451:21627:21795:21990:30003:30012:30051:30054:30074,0,RBL:148.163.156.1:@de.ibm.com:.lbl8.mailshell.net-64.100.201.201 62.2.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: wave52_70d35cb88a653 X-Filterd-Recvd-Size: 8152 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Mon, 24 Feb 2020 11:46:15 +0000 (UTC) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01OBclDL107748; Mon, 24 Feb 2020 06:46:13 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 2yb18ukkg7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Feb 2020 06:46:13 -0500 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 01OBdEkn109251; Mon, 24 Feb 2020 06:46:12 -0500 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 2yb18ukkfm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Feb 2020 06:46:12 -0500 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 01OBZA7U015270; Mon, 24 Feb 2020 11:41:11 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma03wdc.us.ibm.com with ESMTP id 2yaux6330y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Feb 2020 11:41:11 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 01OBf9mH55116216 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 24 Feb 2020 11:41:09 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9E5DA28058; Mon, 24 Feb 2020 11:41:09 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 82CC22805A; Mon, 24 Feb 2020 11:41:09 +0000 (GMT) Received: from localhost.localdomain (unknown [9.114.17.106]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 24 Feb 2020 11:41:09 +0000 (GMT) From: Christian Borntraeger To: Christian Borntraeger , Janosch Frank , Andrew Morton Cc: KVM , Cornelia Huck , David Hildenbrand , Thomas Huth , Ulrich Weigand , Claudio Imbrenda , linux-s390 , Michael Mueller , Vasily Gorbik , Andrea Arcangeli , linux-mm@kvack.org, Will Deacon Subject: [PATCH v4 01/36] mm/gup/writeback: add callbacks for inaccessible pages Date: Mon, 24 Feb 2020 06:40:32 -0500 Message-Id: <20200224114107.4646-2-borntraeger@de.ibm.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200224114107.4646-1-borntraeger@de.ibm.com> References: <20200224114107.4646-1-borntraeger@de.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-02-24_02:2020-02-21,2020-02-24 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 phishscore=0 spamscore=0 suspectscore=2 bulkscore=0 malwarescore=0 mlxscore=0 mlxlogscore=815 impostorscore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002240099 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Claudio Imbrenda With the introduction of protected KVM guests on s390 there is now a concept of inaccessible pages. These pages need to be made accessible before the host can access them. While cpu accesses will trigger a fault that can be resolved, I/O accesses will just fail. We need to add a callback into architecture code for places that will do I/O, namely when writeback is started or when a page reference is taken. This is not only to enable paging, file backing etc, it is also necessary to protect the host against a malicious user space. For example a bad QEMU could simply start direct I/O on such protected memory. We do not want userspace to be able to trigger I/O errors and thus we the logic is "whenever somebody accesses that page (gup) or doing I/O, make sure that this page can be accessed. When the guest tries to access that page we will wait in the page fault handler for writeback to have finished and for the page_ref to be the expected value. On s390x the function is not supposed to fail, so it is ok to use a WARN_ON on failure. If we ever need some more finegrained handling we can tackle this when we know the details. Signed-off-by: Claudio Imbrenda Acked-by: Will Deacon Reviewed-by: David Hildenbrand Reviewed-by: Christian Borntraeger Signed-off-by: Christian Borntraeger --- include/linux/gfp.h | 6 ++++++ mm/gup.c | 15 ++++++++++++--- mm/page-writeback.c | 5 +++++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e5b817cb86e7..be2754841369 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { } #ifndef HAVE_ARCH_ALLOC_PAGE static inline void arch_alloc_page(struct page *page, int order) { } #endif +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE +static inline int arch_make_page_accessible(struct page *page) +{ + return 0; +} +#endif struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, diff --git a/mm/gup.c b/mm/gup.c index 1b521e0ac1de..354bcfbd844b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -193,6 +193,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, struct page *page; spinlock_t *ptl; pte_t *ptep, pte; + int ret; /* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == @@ -250,8 +251,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (is_zero_pfn(pte_pfn(pte))) { page = pte_page(pte); } else { - int ret; - ret = follow_pfn_pte(vma, address, ptep, flags); page = ERR_PTR(ret); goto out; @@ -259,7 +258,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if (flags & FOLL_SPLIT && PageTransCompound(page)) { - int ret; get_page(page); pte_unmap_unlock(ptep, ptl); lock_page(page); @@ -276,6 +274,12 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(-ENOMEM); goto out; } + ret = arch_make_page_accessible(page); + if (ret) { + put_page(page); + page = ERR_PTR(ret); + goto out; + } } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && @@ -1919,6 +1923,11 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); + ret = arch_make_page_accessible(page); + if (ret) { + put_page(head); + goto pte_unmap; + } SetPageReferenced(page); pages[*nr] = page; (*nr)++; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 2caf780a42e7..558d7063c117 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2807,6 +2807,11 @@ int __test_set_page_writeback(struct page *page, bool keep_write) inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); } unlock_page_memcg(page); + /* + * If writeback has been triggered on a page that cannot be made + * accessible, it is too late. + */ + WARN_ON(arch_make_page_accessible(page)); return ret; } From patchwork Mon Feb 24 11:40:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Borntraeger X-Patchwork-Id: 11400223 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 713761871 for ; Mon, 24 Feb 2020 11:41:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2CA672072D for ; Mon, 24 Feb 2020 11:41:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2CA672072D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=de.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C2DAB6B0006; Mon, 24 Feb 2020 06:41:15 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B8D4A6B000A; Mon, 24 Feb 2020 06:41:15 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F5566B0007; Mon, 24 Feb 2020 06:41:15 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 621D86B0005 for ; Mon, 24 Feb 2020 06:41:15 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 159E1181AC9C6 for ; Mon, 24 Feb 2020 11:41:15 +0000 (UTC) X-FDA: 76524829710.14.bike83_450f0c7a41f31 X-Spam-Summary: 2,0,0,6c58a01e572d5bc8,d41d8cd98f00b204,borntraeger@de.ibm.com,,RULES_HIT:41:327:355:379:541:800:960:966:968:973:988:989:1260:1261:1311:1314:1345:1359:1431:1437:1515:1605:1730:1747:1777:1792:1801:1981:2194:2196:2199:2200:2393:2559:2562:2693:2901:3138:3139:3140:3141:3142:3165:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4605:5007:6119:6120:6261:6742:7576:7875:7901:7903:8568:8634:8660:8957:9010:9040:10004:11026:11473:11657:11658:11914:12043:12291:12296:12297:12438:12555:12663:12679:12895:12986:13148:13161:13229:13230:13255:13894:14096:14394:21067:21080:21324:21433:21451:21611:21627:21789:21990:30003:30012:30045:30054:30070:30091,0,RBL:148.163.158.5:@de.ibm.com:.lbl8.mailshell.net-64.100.201.201 62.2.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: bike83_450f0c7a41f31 X-Filterd-Recvd-Size: 20548 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Mon, 24 Feb 2020 11:41:14 +0000 (UTC) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01OBdgeU094462; Mon, 24 Feb 2020 06:41:13 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 2yax44adb8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Feb 2020 06:41:13 -0500 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 01OBdrYI095053; Mon, 24 Feb 2020 06:41:13 -0500 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0b-001b2d01.pphosted.com with ESMTP id 2yax44adau-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Feb 2020 06:41:13 -0500 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 01OBZ9ZV025575; Mon, 24 Feb 2020 11:41:12 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma01dal.us.ibm.com with ESMTP id 2yaux6fwcv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 24 Feb 2020 11:41:12 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 01OBfAYd34013450 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 24 Feb 2020 11:41:10 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2FD0F28058; Mon, 24 Feb 2020 11:41:10 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 088D728060; Mon, 24 Feb 2020 11:41:10 +0000 (GMT) Received: from localhost.localdomain (unknown [9.114.17.106]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 24 Feb 2020 11:41:09 +0000 (GMT) From: Christian Borntraeger To: Christian Borntraeger , Janosch Frank , Andrew Morton Cc: KVM , Cornelia Huck , David Hildenbrand , Thomas Huth , Ulrich Weigand , Claudio Imbrenda , linux-s390 , Michael Mueller , Vasily Gorbik , Andrea Arcangeli , linux-mm@kvack.org Subject: [PATCH v4 05/36] s390/mm: provide memory management functions for protected KVM guests Date: Mon, 24 Feb 2020 06:40:36 -0500 Message-Id: <20200224114107.4646-6-borntraeger@de.ibm.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200224114107.4646-1-borntraeger@de.ibm.com> References: <20200224114107.4646-1-borntraeger@de.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-02-24_02:2020-02-21,2020-02-24 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 adultscore=0 mlxlogscore=911 suspectscore=2 spamscore=0 priorityscore=1501 impostorscore=0 mlxscore=0 malwarescore=0 phishscore=0 bulkscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002240099 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Claudio Imbrenda This provides the basic ultravisor calls and page table handling to cope with secure guests: - provide arch_make_page_accessible - make pages accessible after unmapping of secure guests - provide the ultravisor commands convert to/from secure - provide the ultravisor commands pin/unpin shared - provide callbacks to make pages secure (inacccessible) - we check for the expected pin count to only make pages secure if the host is not accessing them - we fence hugetlbfs for secure pages - add missing radix-tree include into gmap.h The basic idea is that a page can have 3 states: secure, normal or shared. The hypervisor can call into a firmware function called ultravisor that allows to change the state of a page: convert from/to secure. The convert from secure will encrypt the page and make it available to the host and host I/O. The convert to secure will remove the host capability to access this page. The design is that on convert to secure we will wait until writeback and page refs are indicating no host usage. At the same time the convert from secure (export to host) will be called in common code when the refcount or the writeback bit is already set. This avoids races between convert from and to secure. Then there is also the concept of shared pages. Those are kind of secure where the host can still access those pages. We need to be notified when the guest "unshares" such a page, basically doing a convert to secure by then. There is a call "pin shared page" that we use instead of convert from secure when possible. We do use PG_arch_1 as an optimization to minimize the convert from secure/pin shared. Several comments have been added in the code to explain the logic in the relevant places. Co-developed-by: Ulrich Weigand Signed-off-by: Ulrich Weigand Signed-off-by: Claudio Imbrenda Acked-by: David Hildenbrand Reviewed-by: Christian Borntraeger [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger Acked-by: Cornelia Huck --- arch/s390/include/asm/gmap.h | 4 + arch/s390/include/asm/mmu.h | 2 + arch/s390/include/asm/mmu_context.h | 1 + arch/s390/include/asm/page.h | 5 + arch/s390/include/asm/pgtable.h | 35 ++++- arch/s390/include/asm/uv.h | 31 ++++ arch/s390/kernel/uv.c | 227 ++++++++++++++++++++++++++++ 7 files changed, 300 insertions(+), 5 deletions(-) diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h index 37f96b6f0e61..3c4926aa78f4 100644 --- a/arch/s390/include/asm/gmap.h +++ b/arch/s390/include/asm/gmap.h @@ -9,6 +9,7 @@ #ifndef _ASM_S390_GMAP_H #define _ASM_S390_GMAP_H +#include #include /* Generic bits for GMAP notification on DAT table entry changes. */ @@ -31,6 +32,7 @@ * @table: pointer to the page directory * @asce: address space control element for gmap page table * @pfault_enabled: defines if pfaults are applicable for the guest + * @guest_handle: protected virtual machine handle for the ultravisor * @host_to_rmap: radix tree with gmap_rmap lists * @children: list of shadow gmap structures * @pt_list: list of all page tables used in the shadow guest address space @@ -54,6 +56,8 @@ struct gmap { unsigned long asce_end; void *private; bool pfault_enabled; + /* only set for protected virtual machines */ + unsigned long guest_handle; /* Additional data for shadow guest address spaces */ struct radix_tree_root host_to_rmap; struct list_head children; diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h index bcfb6371086f..e21b618ad432 100644 --- a/arch/s390/include/asm/mmu.h +++ b/arch/s390/include/asm/mmu.h @@ -16,6 +16,8 @@ typedef struct { unsigned long asce; unsigned long asce_limit; unsigned long vdso_base; + /* The mmu context belongs to a secure guest. */ + atomic_t is_protected; /* * The following bitfields need a down_write on the mm * semaphore when they are written to. As they are only diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h index 8d04e6f3f796..afa836014076 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -23,6 +23,7 @@ static inline int init_new_context(struct task_struct *tsk, INIT_LIST_HEAD(&mm->context.gmap_list); cpumask_clear(&mm->context.cpu_attach_mask); atomic_set(&mm->context.flush_count, 0); + atomic_set(&mm->context.is_protected, 0); mm->context.gmap_asce = 0; mm->context.flush_mm = 0; mm->context.compat_mm = test_thread_flag(TIF_31BIT); diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index 85e944f04c70..4ebcf891ff3c 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -153,6 +153,11 @@ static inline int devmem_is_allowed(unsigned long pfn) #define HAVE_ARCH_FREE_PAGE #define HAVE_ARCH_ALLOC_PAGE +#if IS_ENABLED(CONFIG_PGSTE) +int arch_make_page_accessible(struct page *page); +#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE +#endif + #endif /* !__ASSEMBLY__ */ #define __PAGE_OFFSET 0x0UL diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 137a3920ca36..cc7a1adacb94 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -19,6 +19,7 @@ #include #include #include +#include extern pgd_t swapper_pg_dir[]; extern void paging_init(void); @@ -520,6 +521,15 @@ static inline int mm_has_pgste(struct mm_struct *mm) return 0; } +static inline int mm_is_protected(struct mm_struct *mm) +{ +#ifdef CONFIG_PGSTE + if (unlikely(atomic_read(&mm->context.is_protected))) + return 1; +#endif + return 0; +} + static inline int mm_alloc_pgste(struct mm_struct *mm) { #ifdef CONFIG_PGSTE @@ -1061,7 +1071,12 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma, static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); + pte_t res; + + res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); + if (mm_is_protected(mm) && pte_present(res)) + uv_convert_from_secure(pte_val(res) & PAGE_MASK); + return res; } #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION @@ -1073,7 +1088,12 @@ void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, static inline pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - return ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID)); + pte_t res; + + res = ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID)); + if (mm_is_protected(vma->vm_mm) && pte_present(res)) + uv_convert_from_secure(pte_val(res) & PAGE_MASK); + return res; } /* @@ -1088,12 +1108,17 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned long addr, pte_t *ptep, int full) { + pte_t res; + if (full) { - pte_t pte = *ptep; + res = *ptep; *ptep = __pte(_PAGE_INVALID); - return pte; + } else { + res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); } - return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); + if (mm_is_protected(mm) && pte_present(res)) + uv_convert_from_secure(pte_val(res) & PAGE_MASK); + return res; } #define __HAVE_ARCH_PTEP_SET_WRPROTECT diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h index cad643b05d19..7956868340c1 100644 --- a/arch/s390/include/asm/uv.h +++ b/arch/s390/include/asm/uv.h @@ -15,6 +15,7 @@ #include #include #include +#include #define UVC_RC_EXECUTED 0x0001 #define UVC_RC_INV_CMD 0x0002 @@ -24,6 +25,10 @@ #define UVC_CMD_QUI 0x0001 #define UVC_CMD_INIT_UV 0x000f +#define UVC_CMD_CONV_TO_SEC_STOR 0x0200 +#define UVC_CMD_CONV_FROM_SEC_STOR 0x0201 +#define UVC_CMD_PIN_PAGE_SHARED 0x0341 +#define UVC_CMD_UNPIN_PAGE_SHARED 0x0342 #define UVC_CMD_SET_SHARED_ACCESS 0x1000 #define UVC_CMD_REMOVE_SHARED_ACCESS 0x1001 @@ -31,8 +36,12 @@ enum uv_cmds_inst { BIT_UVC_CMD_QUI = 0, BIT_UVC_CMD_INIT_UV = 1, + BIT_UVC_CMD_CONV_TO_SEC_STOR = 6, + BIT_UVC_CMD_CONV_FROM_SEC_STOR = 7, BIT_UVC_CMD_SET_SHARED_ACCESS = 8, BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9, + BIT_UVC_CMD_PIN_PAGE_SHARED = 21, + BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22, }; struct uv_cb_header { @@ -69,6 +78,19 @@ struct uv_cb_init { u64 reserved28[4]; } __packed __aligned(8); +struct uv_cb_cts { + struct uv_cb_header header; + u64 reserved08[2]; + u64 guest_handle; + u64 gaddr; +} __packed __aligned(8); + +struct uv_cb_cfs { + struct uv_cb_header header; + u64 reserved08[2]; + u64 paddr; +} __packed __aligned(8); + struct uv_cb_share { struct uv_cb_header header; u64 reserved08[3]; @@ -171,12 +193,21 @@ static inline int is_prot_virt_host(void) return prot_virt_host; } +int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb); +int uv_convert_from_secure(unsigned long paddr); +int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr); + void setup_uv(void); void adjust_to_uv_max(unsigned long *vmax); #else #define is_prot_virt_host() 0 static inline void setup_uv(void) {} static inline void adjust_to_uv_max(unsigned long *vmax) {} + +static inline int uv_convert_from_secure(unsigned long paddr) +{ + return 0; +} #endif #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) || \ diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index 1ddc42154ef6..4539003dac9d 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -12,6 +12,8 @@ #include #include #include +#include +#include #include #include #include @@ -97,4 +99,229 @@ void adjust_to_uv_max(unsigned long *vmax) { *vmax = min_t(unsigned long, *vmax, uv_info.max_sec_stor_addr); } + +/* + * Requests the Ultravisor to pin the page in the shared state. This will + * cause an intercept when the guest attempts to unshare the pinned page. + */ +static int uv_pin_shared(unsigned long paddr) +{ + struct uv_cb_cfs uvcb = { + .header.cmd = UVC_CMD_PIN_PAGE_SHARED, + .header.len = sizeof(uvcb), + .paddr = paddr, + }; + + if (uv_call(0, (u64)&uvcb)) + return -EINVAL; + return 0; +} + +/* + * Requests the Ultravisor to encrypt a guest page and make it + * accessible to the host for paging (export). + * + * @paddr: Absolute host address of page to be exported + */ +int uv_convert_from_secure(unsigned long paddr) +{ + struct uv_cb_cfs uvcb = { + .header.cmd = UVC_CMD_CONV_FROM_SEC_STOR, + .header.len = sizeof(uvcb), + .paddr = paddr + }; + + if (uv_call(0, (u64)&uvcb)) + return -EINVAL; + return 0; +} + +/* + * Calculate the expected ref_count for a page that would otherwise have no + * further pins. This was cribbed from similar functions in other places in + * the kernel, but with some slight modifications. We know that a secure + * page can not be a huge page for example. + */ +static int expected_page_refs(struct page *page) +{ + int res; + + res = page_mapcount(page); + if (PageSwapCache(page)) { + res++; + } else if (page_mapping(page)) { + res++; + if (page_has_private(page)) + res++; + } + return res; +} + +static int make_secure_pte(pte_t *ptep, unsigned long addr, + struct page *exp_page, struct uv_cb_header *uvcb) +{ + pte_t entry = READ_ONCE(*ptep); + struct page *page; + int expected, rc = 0; + + if (!pte_present(entry)) + return -ENXIO; + if (pte_val(entry) & _PAGE_INVALID) + return -ENXIO; + + page = pte_page(entry); + if (page != exp_page) + return -ENXIO; + if (PageWriteback(page)) + return -EAGAIN; + expected = expected_page_refs(page); + if (!page_ref_freeze(page, expected)) + return -EBUSY; + set_bit(PG_arch_1, &page->flags); + rc = uv_call(0, (u64)uvcb); + page_ref_unfreeze(page, expected); + /* Return -ENXIO if the page was not mapped, -EINVAL otherwise */ + if (rc) + rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL; + return rc; +} + +/* + * Requests the Ultravisor to make a page accessible to a guest. + * If it's brought in the first time, it will be cleared. If + * it has been exported before, it will be decrypted and integrity + * checked. + */ +int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb) +{ + struct vm_area_struct *vma; + bool local_drain = false; + spinlock_t *ptelock; + unsigned long uaddr; + struct page *page; + pte_t *ptep; + int rc; + +again: + rc = -EFAULT; + down_read(&gmap->mm->mmap_sem); + + uaddr = __gmap_translate(gmap, gaddr); + if (IS_ERR_VALUE(uaddr)) + goto out; + vma = find_vma(gmap->mm, uaddr); + if (!vma) + goto out; + /* + * Secure pages cannot be huge and userspace should not combine both. + * In case userspace does it anyway this will result in an -EFAULT for + * the unpack. The guest is thus never reaching secure mode. If + * userspace is playing dirty tricky with mapping huge pages later + * on this will result in a segmentation fault. + */ + if (is_vm_hugetlb_page(vma)) + goto out; + + rc = -ENXIO; + page = follow_page(vma, uaddr, FOLL_WRITE); + if (IS_ERR_OR_NULL(page)) + goto out; + + lock_page(page); + ptep = get_locked_pte(gmap->mm, uaddr, &ptelock); + rc = make_secure_pte(ptep, uaddr, page, uvcb); + pte_unmap_unlock(ptep, ptelock); + unlock_page(page); +out: + up_read(&gmap->mm->mmap_sem); + + if (rc == -EAGAIN) { + wait_on_page_writeback(page); + } else if (rc == -EBUSY) { + /* + * If we have tried a local drain and the page refcount + * still does not match our expected safe value, try with a + * system wide drain. This is needed if the pagevecs holding + * the page are on a different CPU. + */ + if (local_drain) { + lru_add_drain_all(); + /* We give up here, and let the caller try again */ + return -EAGAIN; + } + /* + * We are here if the page refcount does not match the + * expected safe value. The main culprits are usually + * pagevecs. With lru_add_drain() we drain the pagevecs + * on the local CPU so that hopefully the refcount will + * reach the expected safe value. + */ + lru_add_drain(); + local_drain = true; + /* And now we try again immediately after draining */ + goto again; + } else if (rc == -ENXIO) { + if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE)) + return -EFAULT; + return -EAGAIN; + } + return rc; +} +EXPORT_SYMBOL_GPL(gmap_make_secure); + +int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr) +{ + struct uv_cb_cts uvcb = { + .header.cmd = UVC_CMD_CONV_TO_SEC_STOR, + .header.len = sizeof(uvcb), + .guest_handle = gmap->guest_handle, + .gaddr = gaddr, + }; + + return gmap_make_secure(gmap, gaddr, &uvcb); +} +EXPORT_SYMBOL_GPL(gmap_convert_to_secure); + +/* + * To be called with the page locked or with an extra reference! This will + * prevent gmap_make_secure from touching the page concurrently. Having 2 + * parallel make_page_accessible is fine, as the UV calls will become a + * no-op if the page is already exported. + */ +int arch_make_page_accessible(struct page *page) +{ + int rc = 0; + + /* Hugepage cannot be protected, so nothing to do */ + if (PageHuge(page)) + return 0; + + /* + * PG_arch_1 is used in 3 places: + * 1. for kernel page tables during early boot + * 2. for storage keys of huge pages and KVM + * 3. As an indication that this page might be secure. This can + * overindicate, e.g. we set the bit before calling + * convert_to_secure. + * As secure pages are never huge, all 3 variants can co-exists. + */ + if (!test_bit(PG_arch_1, &page->flags)) + return 0; + + rc = uv_pin_shared(page_to_phys(page)); + if (!rc) { + clear_bit(PG_arch_1, &page->flags); + return 0; + } + + rc = uv_convert_from_secure(page_to_phys(page)); + if (!rc) { + clear_bit(PG_arch_1, &page->flags); + return 0; + } + + return rc; +} +EXPORT_SYMBOL_GPL(arch_make_page_accessible); + #endif