From patchwork Wed Sep 11 14:34:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800674 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBA96EE49B7 for ; Wed, 11 Sep 2024 14:35:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 786AA940047; Wed, 11 Sep 2024 10:35:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70FA7940021; Wed, 11 Sep 2024 10:35:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B00C940047; Wed, 11 Sep 2024 10:35:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3B88D940021 for ; Wed, 11 Sep 2024 10:35:52 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E5123161905 for ; Wed, 11 Sep 2024 14:35:51 +0000 (UTC) X-FDA: 82552706502.29.C84E356 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) by imf18.hostedemail.com (Postfix) with ESMTP id 0181C1C0019 for ; Wed, 11 Sep 2024 14:35:49 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b=Ql7vwXSp; dmarc=pass (policy=quarantine) header.from=amazon.de; spf=pass (imf18.hostedemail.com: domain of "prvs=97728e23b=faresx@amazon.de" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=97728e23b=faresx@amazon.de" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726065273; a=rsa-sha256; cv=none; b=fWs26cJLUvnPIenj0edXMatd3+6D4WRTaqFeaCoGXkVvOco9hU6tirTxEJ/Gg1hPipsyD9 GfkCe9pPTrwqQeQFbBP4cHmOzKULECJN9Os+b/LwglmyDRFk7zQ4cbjgPHdkE6gjO9dFPf uDAbEqa2Bo6kjBkzw8iH5tPKapyruYY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b=Ql7vwXSp; dmarc=pass (policy=quarantine) header.from=amazon.de; spf=pass (imf18.hostedemail.com: domain of "prvs=97728e23b=faresx@amazon.de" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=97728e23b=faresx@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726065273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M6VWy3IvZCNc4zvvTM+7ldu6BbKiq6bFPxrlGyI8xuQ=; b=cMi3tEosJx1unmn73gdE5jW9T06IX1qWJ90lFtYKvIF4/zpugDD1+ktRFr3OMm0ywm8L58 QUk2J3npL7ERiHamRLdZXCVBoFVlALhIp7zaNEOwdBxUQEhIr2bs2qmuO1ytDKCTTeWltB 0at7kOvQRYx8dybRDtOHsg13QDCHGls= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065350; x=1757601350; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=M6VWy3IvZCNc4zvvTM+7ldu6BbKiq6bFPxrlGyI8xuQ=; b=Ql7vwXSpGRkoQ47dOei6pI1jd8YmZY/5RvPOfZC4Fzr2X7cmX1NS8Wf7 PVPaSsWqQT4TGb25QKAuSrV4J7OSK2brJ9nqjiO0CZ0sIs1MRngECbQpg eca71bzQ0t9DJki1x/4Bx4kQk3kzm5CqbDt/Lpwn94g+ECkWhvszv1Bpx Q=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="679649476" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:35:48 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:50131] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.39.168:2525] with esmtp (Farcaster) id ce0e4cf6-8c87-42ac-ae89-33a20b34e603; Wed, 11 Sep 2024 14:35:46 +0000 (UTC) X-Farcaster-Flow-ID: ce0e4cf6-8c87-42ac-ae89-33a20b34e603 Received: from EX19D007EUA004.ant.amazon.com (10.252.50.76) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:35:42 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA004.ant.amazon.com (10.252.50.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:35:41 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:35:39 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , Roman Kagan , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Andrew Morton , Kemeng Shi , =?utf-8?q?Pierre-Cl=C3=A9ment_Tosi?= , Ard Biesheuvel , Mark Rutland , "Javier Martinez Canillas" , Arnd Bergmann , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , "Randy Dunlap" , Bjorn Helgaas , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , David Hildenbrand , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 1/7] mseal: expose interface to seal / unseal user memory ranges Date: Wed, 11 Sep 2024 14:34:00 +0000 Message-ID: <20240911143421.85612-2-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0181C1C0019 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ocxu9yj6o6k3h7wzydiy64w8tezdzkah X-HE-Tag: 1726065349-300569 X-HE-Meta: U2FsdGVkX18AaL63ErfDEsvKfIs5YHc1Etsmj7kMPMMfxMPsNUhrla9Xn4dC8bv+mEgHeze5gP+daF3L6ty8i+lRJWgZKixZNN3GVVmzSm1LzZ0za5zn3mKVk5tsHlsK85g3Fu4MOinTaHZcWep8Fha0P5H4ZeDi2Qo/E94ZYslRRDCHS5IxkXJtzOk03rAspV0o405cXoObreiRscs9qKOdDuzENSqQOE8YkQgKNXBo+PH6VCp+IrPl+Qv+q2y4W95obimq81JoqwqXhZPmFrQF7up5kE6TEjl9azPMLFyyKwID6bjIPT8q+NOX9ZQElrbm8GvdN2ky8mnDtXrgM5MToVzngxJfE5pOTvhc/GUI5aRJRISYzQiR/2b7GnSQCkaGueBWX/YSlCSCKsQUuTRCGH12OberG95e79ZAb320V3c/pdU17+Ke5/Su37ppnN3VW9gh4ftmF/deZR46LDO+ahKpl1zGBo+SZso2qXXSDV/Hd8HHxgOZazQAUurywamizvqOm5SM9NNdXMHWTxykRKoT93NRNQxNwY0Dmn+sBLblKie3jfwdTqdLxz0f8uZ2pHA1bB+VnLnEuEl/Gs++8NdoMBp/UkZ5398rQQiAZBSE/sct+M1gtBhoZxBvYleoLEs0GlrVIu7noGTH1xuPDlqSn8hCUGwJG1THL7R8nchajVhBoWu+J5gilU/j+fZwoTOAOHvhb97xBtIWFE5vEqMVFfd8k82H3LO5UA1xzj0NID987OkPAvX7dMSnPI35e5dEPebxqs8cRKqNh5ydNYyBDhosNrT0+s7aeGBEqBV/fW23WSXdweEWcBYI0YjAdxgWOvjou/eK+NT/2YfWUznyda3SKibTASiyQy2IYXXlK1ea3GKtlHdoTzntied3YV0/1eaifJJ4GMXWYRH7ps7W8odtH3iJXyNR5cPA0+XL8pcsTkKJMfuQr9qo/y0iCfjYoI/qBDHe56D WzyYq1yh X2Ijl6thxtmYzaGk/a9ZKyzkIdmFg0L3+KuOQOcCE5t8R6mFgJO2t4zgs3K6AKN1xL30LPOELQx/rbGYyyhzs6jPXHdwxNOF9H/wybEBT5S+ce7VuC64pXToffYOPnbD4J/RpLtMnrnPmFg654jKoqLn/TBIdMoCqxLPsXfYzcg2FM/dLJFGw7qxWOJxRm3yN1/0fZFjCnWAeW2hwkNy6OKppSnhEs3MGFLv6ezEUvoFeFUwNiTsnkOz4HbSVnjwUNEVqhF4GyePRQtVbEqCXdX68AtyHo8/mr5Lhi2R1mjDwXsPZY3dxtDLLwh/f8bMFV/FaCD+l7gdbh2HzOua7KYPKDqP4PgXhK9VSWo8sCin3WNPAeY25D/ubkcyOFPFStHJk28mcF2k+ZG+bK06C3FrNGA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: To make sure the kernel mm-local mapping is untouched by the user, we will seal the VMA before changing the protection to be used by the kernel. This will guarantee that userspace can't unmap or alter this VMA while it is being used by the kernel. After the kernel is done with the secret memory, it will unseal the VMA to be able to unmap and free it. Unseal operation is not exposed to userspace. Signed-off-by: Fares Mehanna Signed-off-by: Roman Kagan --- mm/internal.h | 7 +++++ mm/mseal.c | 81 ++++++++++++++++++++++++++++++++------------------- 2 files changed, 58 insertions(+), 30 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index b4d86436565b..cf7280d101e9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1501,6 +1501,8 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end); bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end, int behavior); +/* mm's mmap write lock must be taken before seal/unseal operation */ +int do_mseal(unsigned long start, unsigned long end, bool seal); #else static inline int can_do_mseal(unsigned long flags) { @@ -1518,6 +1520,11 @@ static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, { return true; } + +static inline int do_mseal(unsigned long start, unsigned long end, bool seal) +{ + return -EINVAL; +} #endif #ifdef CONFIG_SHRINKER_DEBUG diff --git a/mm/mseal.c b/mm/mseal.c index 15bba28acc00..aac9399ffd5d 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -26,6 +26,11 @@ static inline void set_vma_sealed(struct vm_area_struct *vma) vm_flags_set(vma, VM_SEALED); } +static inline void clear_vma_sealed(struct vm_area_struct *vma) +{ + vm_flags_clear(vma, VM_SEALED); +} + /* * check if a vma is sealed for modification. * return true, if modification is allowed. @@ -117,7 +122,7 @@ bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, - unsigned long end, vm_flags_t newflags) + unsigned long end, vm_flags_t newflags, bool seal) { int ret = 0; vm_flags_t oldflags = vma->vm_flags; @@ -131,7 +136,10 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, goto out; } - set_vma_sealed(vma); + if (seal) + set_vma_sealed(vma); + else + clear_vma_sealed(vma); out: *prev = vma; return ret; @@ -167,9 +175,9 @@ static int check_mm_seal(unsigned long start, unsigned long end) } /* - * Apply sealing. + * Apply sealing / unsealing. */ -static int apply_mm_seal(unsigned long start, unsigned long end) +static int apply_mm_seal(unsigned long start, unsigned long end, bool seal) { unsigned long nstart; struct vm_area_struct *vma, *prev; @@ -191,11 +199,14 @@ static int apply_mm_seal(unsigned long start, unsigned long end) unsigned long tmp; vm_flags_t newflags; - newflags = vma->vm_flags | VM_SEALED; + if (seal) + newflags = vma->vm_flags | VM_SEALED; + else + newflags = vma->vm_flags & ~(VM_SEALED); tmp = vma->vm_end; if (tmp > end) tmp = end; - error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags); + error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags, seal); if (error) return error; nstart = vma_iter_end(&vmi); @@ -204,6 +215,37 @@ static int apply_mm_seal(unsigned long start, unsigned long end) return 0; } +int do_mseal(unsigned long start, unsigned long end, bool seal) +{ + int ret; + + if (end < start) + return -EINVAL; + + if (end == start) + return 0; + + /* + * First pass, this helps to avoid + * partial sealing in case of error in input address range, + * e.g. ENOMEM error. + */ + ret = check_mm_seal(start, end); + if (ret) + goto out; + + /* + * Second pass, this should success, unless there are errors + * from vma_modify_flags, e.g. merge/split error, or process + * reaching the max supported VMAs, however, those cases shall + * be rare. + */ + ret = apply_mm_seal(start, end, seal); + +out: + return ret; +} + /* * mseal(2) seals the VM's meta data from * selected syscalls. @@ -256,7 +298,7 @@ static int apply_mm_seal(unsigned long start, unsigned long end) * * unseal() is not supported. */ -static int do_mseal(unsigned long start, size_t len_in, unsigned long flags) +static int __do_mseal(unsigned long start, size_t len_in, unsigned long flags) { size_t len; int ret = 0; @@ -277,33 +319,12 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long flags) return -EINVAL; end = start + len; - if (end < start) - return -EINVAL; - - if (end == start) - return 0; if (mmap_write_lock_killable(mm)) return -EINTR; - /* - * First pass, this helps to avoid - * partial sealing in case of error in input address range, - * e.g. ENOMEM error. - */ - ret = check_mm_seal(start, end); - if (ret) - goto out; - - /* - * Second pass, this should success, unless there are errors - * from vma_modify_flags, e.g. merge/split error, or process - * reaching the max supported VMAs, however, those cases shall - * be rare. - */ - ret = apply_mm_seal(start, end); + ret = do_mseal(start, end, true); -out: mmap_write_unlock(current->mm); return ret; } @@ -311,5 +332,5 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long flags) SYSCALL_DEFINE3(mseal, unsigned long, start, size_t, len, unsigned long, flags) { - return do_mseal(start, len, flags); + return __do_mseal(start, len, flags); }