From patchwork Tue Oct 17 09:08:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 13424811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3EA5C41513 for ; Tue, 17 Oct 2023 09:08:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2503E8D0102; Tue, 17 Oct 2023 05:08:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FF068D0101; Tue, 17 Oct 2023 05:08:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C7C08D0102; Tue, 17 Oct 2023 05:08:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F09228D0101 for ; Tue, 17 Oct 2023 05:08:22 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C316B1A1011 for ; Tue, 17 Oct 2023 09:08:22 +0000 (UTC) X-FDA: 81354377244.29.F2BC415 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf13.hostedemail.com (Postfix) with ESMTP id 025832001F for ; Tue, 17 Oct 2023 09:08:19 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=GVTW6wES; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf13.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.214.176 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697533700; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=NKe+Swi2W96ZVVpAOWJ7tIG0DZ/U4YWWNdOm8VCrWW4=; b=L5tX2gPFK31jqwzCaWM+uRlKgGyjyxbDBXBQeLfsImy2mhykj6AzXjaTHEcIBu2qpbFXUr h1PvVsrplln6Rj1Tw7Lc7FujmZU3yvS+PdD6y8B/guRl+ctiON0VHXlB1r2EEquwHA8y6m 7NhqdhvwJKpLt+bRaXVvjwjcapk/4TU= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=GVTW6wES; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf13.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.214.176 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697533700; a=rsa-sha256; cv=none; b=SVkzkWwPFiTQ7+8TakWj7ww48ObvfvmcT1s17fVGCQuBKcPJeLLW6OhevGRmQ3r8DCw/VR hHzS36GNu1fhx2nTc+Lm6U9bgsuO/we5GWu0BTfuMgg6dsJqav0Be6KiyXR15xebksBePN nptkXz97tWIGi2N84oPLNpFG38M8BBc= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1c9bf22fe05so33621525ad.2 for ; Tue, 17 Oct 2023 02:08:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1697533699; x=1698138499; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=NKe+Swi2W96ZVVpAOWJ7tIG0DZ/U4YWWNdOm8VCrWW4=; b=GVTW6wESXXFFvp9+0RTwUpc+DohmdQLKZBYS/14cZdEGt5hM3ajmHtQ+rzVy3nGkjO uBonoCAjEbVaaOFdvi7oLem2aqPRB3vE1zpyqNtTnGOONqeEhoEO5NyH9NDNvq6IVBm8 cEbGeR0qQ54axDtNJ25kDcPUZg6Zo6cnWqJ+E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697533699; x=1698138499; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NKe+Swi2W96ZVVpAOWJ7tIG0DZ/U4YWWNdOm8VCrWW4=; b=v6WMMv6rDSIia9AZnX8lR9KBurPm39XY4BKbOlkKjJYiXIcQidjN32stXZiKNGiPem lU1aILFhtMDGhGuALVxjnt1gxBFd5BWlZYhaa/x3K2dszVcEFwGxyVZ0I4l5CthOZ+K8 rLpzfn/1U8l0Y/750SWjMbtQlyuvRBQ3Fp8Ci/qOSftPlMpfUWnodGjtri9rTQE9cBr3 NvLrRSi/x3OIZQ2TX+9/9pT6TaDhVpox2sg0tiB/GrT2KKZW2sqwlF3Iu+zsx16MFbOl /JJCg7kL10hHlsvzczrrimWn23lFMF2/E/9p3wRnQQpMDOdeZZgkCa8+APw9JccXuYI8 A9oQ== X-Gm-Message-State: AOJu0YxpP4Fk84+EZgCZh1R2w/9DAHyMyH1GZtJvVN3164scdoR5GLBK hWqr30IgC+e8CrHRE0aIwAABRg== X-Google-Smtp-Source: AGHT+IEPy4Me5e678QQykxBsBj+vOH1BHbZ6OezSWsHSR8bPfqV1m1EDzruY9mDJY9RLscrPzqb29g== X-Received: by 2002:a17:903:110d:b0:1b9:e972:134d with SMTP id n13-20020a170903110d00b001b9e972134dmr1496389plh.3.1697533698525; Tue, 17 Oct 2023 02:08:18 -0700 (PDT) Received: from localhost (9.184.168.34.bc.googleusercontent.com. [34.168.184.9]) by smtp.gmail.com with UTF8SMTPSA id m17-20020a170902db1100b001c3e732b8dbsm996886plx.168.2023.10.17.02.08.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Oct 2023 02:08:18 -0700 (PDT) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Subject: [RFC PATCH v2 0/8] Introduce mseal() syscall Date: Tue, 17 Oct 2023 09:08:07 +0000 Message-ID: <20231017090815.1067790-1-jeffxu@chromium.org> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: bxqf1pq783tpp4a1o3i8f47m76j61ocm X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 025832001F X-HE-Tag: 1697533699-212380 X-HE-Meta: U2FsdGVkX19eTjdQ+jw/tK7dlQ/tQB8uGCqtP9Fl8p9q8/PNcFA/arV25yIT/A5wUj4Om9SpygnnnRtEUM4ncEoS3ShuUnkNDtxu0toAJUaoQWuFYbb1+OZeVq5syYuDh/xHMNyuggGk4cB+XIU+uIpAhj2Gbd1sVirua5WIJmT7R7mE78XAsn1eblVoA71TPGd5OZ/vaAeH/aY+Luz8qBXXMSTj7VWf4RqJNwPhS08FmqOd5hu3lNQ1bWPXDPHihoPQQOCitv19nkvVAb0pbxP3RTEuDwj8Xv2P0ts8Ga35RCR/mU4E/RTVYjeCKiK8y+ck4PCpYQ3trrEGBH5Kdae6IXm348dBsEu/ICZMBoxfPTGz3ncL4j+vJYvwEMUUh/CYdyrFdR+6NjAhcuDxiGog/+JDFCmp9IzkFwY7po/S44V1DmWtC6jHRj1xT3SHZET0x40Xk9EsJMhrpx7XfKI/O7gWL/sDM1Ta8SyvYGRT4B3SFCvyMGevNDRcwPYDAzxdjY5mcEMhGhOKNAZp1k0ShB4XWLTXo3PRrdNheb3isqrOFqn3L7lL7PmrLZDH9HtV53U6BKtAhO0pJl5CopNBKTd/vuIvocuB6rv1AgJImfN/Y5DZOqiX9n/YgAHEnrk+RItUv/vldIIQSZYqOwemrPAz790gCZ+3y5+9fnneHLWjAtrd2+khy1lSH3IpGmDleIvqE+IvUSbUk6oFbcGJJUXuo9Jkq0S7Gw2ouZjb6ld3QOw3BitZeRoV1bsfALzb8LVFuup1aK6Gr4dcA+lf9FPHfriMFl8bTbPi1+e60FCr9JREGDWJqw07JQ2paHmNqY9F+w/WbjW1KG0rtDmNnnMAyeKJylQDKwokp7OFC6BP62vwECheI6tWZ3gJnXvEvLA5snNqpDCMDcvaUcncTOuiL5ln5PGyD9SPwYkvVdacCdJHKXChtowzpm2F/IY7Le1qPrcfgfoUuI6 31EO/d2g DF5TLHDS4XPtqN9JUZLxRGa0ccxnrwL8uxdd5rdGYlRTcM2NNxczh0ZQx2dcphbLjdmjRkI5l5ahvgUQoJWxg7b6al2DM+0I9BKFBjieWRhp/oLZj0Mr7qE4nOPPvoYkqY4f2BG4xMECDW2l4u5XT5BOfL2w+EQRN4EFqw0CjVJ09pNi6EK328PnVducBnEQH0jPeAJsdM6HSAa8Rgo6QDkzYWhUFfSpiOdyUOUhS1D+CTf5hVnYMPcxgf4RG//K6pmxhyX0/4CraK95tSzCthXG50L5+d/tfnbW86+KzMr+/joR36vdtb54gBnx4cCVdsyNwH+XyKPWhsLMb36eeEJd1yphQBMDeIOhGC64y26gnQ2H/CXytaigJ/Klaaw/2C7zhzkk+98l7Kipcy/tIbkrOf+GvvXFRs++vwiRCNvFjIATeNZkpswjUIV3e5un8FSkPiOyU81AMxCc70BykuqYU/t8XxKZrh4Xt9NuI4rBi/up2H9CcMgDIQkymp66GErQ/7Wk7VkflLynfUlygB83vqKAoalzozo6iEeeV6ndYhxIE7lprXUcrLf8ZGdewhNkdf167V1YDXBTHuE4OzgpvK6JU3JQUAm3GZxin7nGd4RlnRuD2UubZK8gB+cPCOl5jxh3CjiP8WqAfTTdynVaPzg8oZnlToIA2n+Ep+WQHpWKrq7Q7GuU9Yg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jeff Xu This patchset proposes a new mseal() syscall for the Linux kernel. Modern CPUs support memory permissions such as RW and NX bits. Linux has supported NX since the release of kernel version 2.6.8 in August 2004 [1]. The memory permission feature improves security stance on memory corruption bugs, i.e. the attacker can’t just write to arbitrary memory and point the code to it, the memory has to be marked with X bit, or else an exception will happen. The protection is set by mmap(2), mprotect(2), mremap(2). Memory sealing additionally protects the mapping itself against modifications. This is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management syscall. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case. The new mseal() is an architecture independent syscall, and with following signature: mseal(void addr, size_t len, unsigned long types, unsigned long flags) addr/len: memory range. Must be continuous/allocated memory, or else mseal() will fail and no VMA is updated. For details on acceptable arguments, please refer to comments in mseal.c. Those are also fully covered by the selftest. types: bit mask to specify which syscall to seal. Five syscalls can be sealed, as specified by bitmasks: MM_SEAL_MPROTECT: Deny mprotect(2)/pkey_mprotect(2). MM_SEAL_MUNMAP: Deny munmap(2). MM_SEAL_MMAP: Deny mmap(2). MM_SEAL_MREMAP: Deny mremap(2). MM_SEAL_MSEAL: Deny adding a new seal type. Each bit represents sealing for one specific syscall type, e.g. MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bitmask is that the API is extendable, i.e. when needed, the sealing can be extended to madvise, mlock, etc. Backward compatibility is also easy. The kernel will remember which seal types are applied, and the application doesn’t need to repeat all existing seal types in the next mseal(). Once a seal type is applied, it can’t be unsealed. Call mseal() on an existing seal type is a no-action, not a failure. MM_SEAL_MSEAL will deny mseal() calls that try to add a new seal type. Internally, vm_area_struct adds a new field vm_seals, to store the bit masks. For the affected syscalls, such as mprotect, a check(can_modify_mm) for sealing is added, this usually happens at the early point of the syscall, before any update is made to VMAs. The effect of that is: if any of the VMAs in the given address range fails the sealing check, none of the VMA will be updated. The idea that inspired this patch comes from Stephen Röttger’s work in V8 CFI [5], Chrome browser in ChromeOS will be the first user of this API. [1] https://kernelnewbies.org/Linux_2_6_8 [2] https://v8.dev/blog/control-flow-integrity [3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274 [4] https://man.openbsd.org/mimmutable.2 [5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc PATCH history: v1: Use _BITUL to define MM_SEAL_XX type. Use unsigned long for seal type in sys_mseal() and other functions. Remove internal VM_SEAL_XX type and convert_user_seal_type(). Remove MM_ACTION_XX type. Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask. Add more comments in code. Add detailed commit message. v0: https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/ Jeff Xu (8): mseal: Add mseal(2) syscall. mseal: Wire up mseal syscall mseal: add can_modify_mm and can_modify_vma mseal: Check seal flag for mprotect(2) mseal: Check seal flag for munmap(2) mseal: Check seal flag for mremap(2) mseal:Check seal flag for mmap(2) selftest mm/mseal mprotect/munmap/mremap/mmap arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/aio.c | 5 +- include/linux/mm.h | 44 +- include/linux/mm_types.h | 7 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/mman.h | 6 + ipc/shm.c | 3 +- kernel/sys_ni.c | 1 + mm/Kconfig | 8 + mm/Makefile | 1 + mm/internal.h | 4 +- mm/mmap.c | 57 +- mm/mprotect.c | 15 + mm/mremap.c | 30 +- mm/mseal.c | 268 ++++ mm/nommu.c | 6 +- mm/util.c | 8 +- tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/mseal_test.c | 1428 +++++++++++++++++++ 37 files changed, 1891 insertions(+), 28 deletions(-) create mode 100644 mm/mseal.c create mode 100644 tools/testing/selftests/mm/mseal_test.c