From patchwork Tue Sep 15 21:15:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Konovalov X-Patchwork-Id: 11777727 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 193446CA for ; Tue, 15 Sep 2020 21:16:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9D4B820770 for ; Tue, 15 Sep 2020 21:16:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="IRThtfw2" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D4B820770 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C532790007A; Tue, 15 Sep 2020 17:16:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C0398900012; Tue, 15 Sep 2020 17:16:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D89490007A; Tue, 15 Sep 2020 17:16:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 7E31B900012 for ; Tue, 15 Sep 2020 17:16:41 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 43482362E for ; Tue, 15 Sep 2020 21:16:41 +0000 (UTC) X-FDA: 77266555002.16.clock23_090092127114 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 156C2100E6912 for ; Tue, 15 Sep 2020 21:16:41 +0000 (UTC) X-Spam-Summary: 1,0,0,c2360a81002847f2,d41d8cd98f00b204,3ny9hxwokcbw2f5j6qcfnd8gg8d6.4gedafmp-eecn24c.gj8@flex--andreyknvl.bounces.google.com,,RULES_HIT:69:152:327:355:379:541:960:966:967:968:973:988:989:1042:1260:1277:1313:1314:1345:1359:1431:1437:1516:1518:1593:1594:1605:1730:1747:1777:1792:1801:1981:2194:2196:2198:2199:2200:2201:2393:2525:2538:2559:2563:2682:2685:2693:2731:2859:2903:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4321:4385:4605:5007:6261:6653:6691:6742:7514:7875:7903:7904:8603:8660:9025:9036:9121:9592:9969:10004:11026:11232:11233:11657:11854:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895:12986:13148:13161:13229:13230:14096:14097:14394:14659:21080:21324:21365:21444:21451:21627:21740:21772:21796:21939:21987:21990:30003:30012:30036:30054:30055:30067,0,RBL:209.85.160.202:@flex--andreyknvl.bounces.google.com:.lbl8.mailshell .net-62. X-HE-Tag: clock23_090092127114 X-Filterd-Recvd-Size: 37966 Received: from mail-qt1-f202.google.com (mail-qt1-f202.google.com [209.85.160.202]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Sep 2020 21:16:40 +0000 (UTC) Received: by mail-qt1-f202.google.com with SMTP id j35so3995743qtk.14 for ; Tue, 15 Sep 2020 14:16:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=m6D7dHkvwjdaFH4BtYThp4DoX0Pg+nOZ+Zx/xU3JNr8=; b=IRThtfw2OiVC/m3IJTgy76Tzhg5Mym+zib7RMc7Bs0Hkd8E82Tm/LGu1Kvz8+2qMX3 5B9BGWWr78IfkImvG5IOUU2TjzMH5to0dZWi0bJ5Q+JTBfCh4lMfPlAGSqi647lkVoRL LngBQwbLWdt6lIKo6CHcAWcCwddFh/VqWgI/4zNYBLo49HwbqvREsNDAQ0UhkRLVbO2K JdbK8iEAX1Oh7E4Jes8QBeEaYhmU5hCMWq6+U5DZNHyMPL+J6kmIm0la+pM5/TXbu/t9 e7HX3W15sDklcF0LBPFX1RPndBMuu45LO391X8XKdBb9mZG1f95PUWaihH7fjueOK1sY MpDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=m6D7dHkvwjdaFH4BtYThp4DoX0Pg+nOZ+Zx/xU3JNr8=; b=kH5EwnjjN8v4ZM/pJOP9RyOszvefDNckH8XHH3q2eyx8Ko9kduWqQ/Ytw0A6HAYrtX VjNVvlsCL2Ma4tUXw0Tn5fhe0En7nQd+kgviRZM4DOwNGy4rQI2mc5UeMEy4k8KqlQ/X AiN4kKgK+2izH5FkXZGs/qLZ3ZLztxPO6wtCQaG7OU/DC1lYmYp7EYZvROLKJgFSRmHU sdIAYt0qs4XEdhfDl035NI8iy723dmMKktkgTCHlkL9c3MT4SW2l2HPgZ3mVUgEgQwzx Zc5JrEWi8KhobctZmNWEkvKgH3KRGvZkUizLqlwso+LKCump35ygnmho2bVONIntaCnM WO6g== X-Gm-Message-State: AOAM531Vzs32MbZzb7Ht5p6P5KjVnJbB2QkX64SbiUtsf8IXGcnZorZq 64vIpIVV/Nuh5vXh0+xqct8GDBw/eMiXnJUp X-Google-Smtp-Source: ABdhPJwC+Bac8VFkog5L58LOv2hGQgomm6AuwXpqTDU77SgWeP1xZ0tIBdWT4dAbTqMtUMbz8YPno45uXQXdXA6f X-Received: from andreyknvl3.muc.corp.google.com ([2a00:79e0:15:13:7220:84ff:fe09:7e9d]) (user=andreyknvl job=sendgmr) by 2002:a0c:f6c4:: with SMTP id d4mr3785883qvo.41.1600204599606; Tue, 15 Sep 2020 14:16:39 -0700 (PDT) Date: Tue, 15 Sep 2020 23:15:49 +0200 In-Reply-To: Message-Id: <88c275dc4eef13c8bcbe74ecec661733dcbc67b8.1600204505.git.andreyknvl@google.com> Mime-Version: 1.0 References: X-Mailer: git-send-email 2.28.0.618.gf4bc123cb7-goog Subject: [PATCH v2 07/37] kasan: split out shadow.c from common.c From: Andrey Konovalov To: Dmitry Vyukov , Vincenzo Frascino , Catalin Marinas , kasan-dev@googlegroups.com Cc: Andrey Ryabinin , Alexander Potapenko , Marco Elver , Evgenii Stepanov , Elena Petrova , Branislav Rankov , Kevin Brodsky , Will Deacon , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrey Konovalov X-Rspamd-Queue-Id: 156C2100E6912 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a preparatory commit for the upcoming addition of a new hardware tag-based (MTE-based) KASAN mode. The new mode won't be using shadow memory. Move all shadow-related code to shadow.c, which is only enabled for software KASAN modes that use shadow memory. No functional changes for software modes. Signed-off-by: Andrey Konovalov Signed-off-by: Vincenzo Frascino --- Change-Id: Ic1c32ce72d4649848e9e6a1f2c8dd269c77673f2 --- mm/kasan/Makefile | 6 +- mm/kasan/common.c | 486 +------------------------------------------ mm/kasan/shadow.c | 509 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 514 insertions(+), 487 deletions(-) create mode 100644 mm/kasan/shadow.c diff --git a/mm/kasan/Makefile b/mm/kasan/Makefile index 7cf685bb51bd..7cc1031e1ef8 100644 --- a/mm/kasan/Makefile +++ b/mm/kasan/Makefile @@ -10,6 +10,7 @@ CFLAGS_REMOVE_generic_report.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_init.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_quarantine.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_report.o = $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_shadow.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_tags.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_tags_report.o = $(CC_FLAGS_FTRACE) @@ -26,9 +27,10 @@ CFLAGS_generic_report.o := $(CC_FLAGS_KASAN_RUNTIME) CFLAGS_init.o := $(CC_FLAGS_KASAN_RUNTIME) CFLAGS_quarantine.o := $(CC_FLAGS_KASAN_RUNTIME) CFLAGS_report.o := $(CC_FLAGS_KASAN_RUNTIME) +CFLAGS_shadow.o := $(CC_FLAGS_KASAN_RUNTIME) CFLAGS_tags.o := $(CC_FLAGS_KASAN_RUNTIME) CFLAGS_tags_report.o := $(CC_FLAGS_KASAN_RUNTIME) obj-$(CONFIG_KASAN) := common.o report.o -obj-$(CONFIG_KASAN_GENERIC) += init.o generic.o generic_report.o quarantine.o -obj-$(CONFIG_KASAN_SW_TAGS) += init.o tags.o tags_report.o +obj-$(CONFIG_KASAN_GENERIC) += init.o generic.o generic_report.o shadow.o quarantine.o +obj-$(CONFIG_KASAN_SW_TAGS) += init.o shadow.o tags.o tags_report.o diff --git a/mm/kasan/common.c b/mm/kasan/common.c index c9daf2c33651..43a927e70067 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * This file contains common generic and tag-based KASAN code. + * This file contains common KASAN code. * * Copyright (c) 2014 Samsung Electronics Co., Ltd. * Author: Andrey Ryabinin @@ -18,7 +18,6 @@ #include #include #include -#include #include #include #include @@ -31,12 +30,8 @@ #include #include #include -#include #include -#include -#include - #include "kasan.h" #include "../slab.h" @@ -66,93 +61,6 @@ void kasan_disable_current(void) current->kasan_depth--; } -bool __kasan_check_read(const volatile void *p, unsigned int size) -{ - return check_memory_region((unsigned long)p, size, false, _RET_IP_); -} -EXPORT_SYMBOL(__kasan_check_read); - -bool __kasan_check_write(const volatile void *p, unsigned int size) -{ - return check_memory_region((unsigned long)p, size, true, _RET_IP_); -} -EXPORT_SYMBOL(__kasan_check_write); - -#undef memset -void *memset(void *addr, int c, size_t len) -{ - if (!check_memory_region((unsigned long)addr, len, true, _RET_IP_)) - return NULL; - - return __memset(addr, c, len); -} - -#ifdef __HAVE_ARCH_MEMMOVE -#undef memmove -void *memmove(void *dest, const void *src, size_t len) -{ - if (!check_memory_region((unsigned long)src, len, false, _RET_IP_) || - !check_memory_region((unsigned long)dest, len, true, _RET_IP_)) - return NULL; - - return __memmove(dest, src, len); -} -#endif - -#undef memcpy -void *memcpy(void *dest, const void *src, size_t len) -{ - if (!check_memory_region((unsigned long)src, len, false, _RET_IP_) || - !check_memory_region((unsigned long)dest, len, true, _RET_IP_)) - return NULL; - - return __memcpy(dest, src, len); -} - -/* - * Poisons the shadow memory for 'size' bytes starting from 'addr'. - * Memory addresses should be aligned to KASAN_GRANULE_SIZE. - */ -void kasan_poison_memory(const void *address, size_t size, u8 value) -{ - void *shadow_start, *shadow_end; - - /* - * Perform shadow offset calculation based on untagged address, as - * some of the callers (e.g. kasan_poison_object_data) pass tagged - * addresses to this function. - */ - address = reset_tag(address); - - shadow_start = kasan_mem_to_shadow(address); - shadow_end = kasan_mem_to_shadow(address + size); - - __memset(shadow_start, value, shadow_end - shadow_start); -} - -void kasan_unpoison_memory(const void *address, size_t size) -{ - u8 tag = get_tag(address); - - /* - * Perform shadow offset calculation based on untagged address, as - * some of the callers (e.g. kasan_unpoison_object_data) pass tagged - * addresses to this function. - */ - address = reset_tag(address); - - kasan_poison_memory(address, size, tag); - - if (size & KASAN_GRANULE_MASK) { - u8 *shadow = (u8 *)kasan_mem_to_shadow(address + size); - - if (IS_ENABLED(CONFIG_KASAN_SW_TAGS)) - *shadow = tag; - else - *shadow = size & KASAN_GRANULE_MASK; - } -} - static void __kasan_unpoison_stack(struct task_struct *task, const void *sp) { void *base = task_stack_page(task); @@ -540,395 +448,3 @@ void kasan_kfree_large(void *ptr, unsigned long ip) kasan_report_invalid_free(ptr, ip); /* The object will be poisoned by page_alloc. */ } - -#ifdef CONFIG_MEMORY_HOTPLUG -static bool shadow_mapped(unsigned long addr) -{ - pgd_t *pgd = pgd_offset_k(addr); - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - pte_t *pte; - - if (pgd_none(*pgd)) - return false; - p4d = p4d_offset(pgd, addr); - if (p4d_none(*p4d)) - return false; - pud = pud_offset(p4d, addr); - if (pud_none(*pud)) - return false; - - /* - * We can't use pud_large() or pud_huge(), the first one is - * arch-specific, the last one depends on HUGETLB_PAGE. So let's abuse - * pud_bad(), if pud is bad then it's bad because it's huge. - */ - if (pud_bad(*pud)) - return true; - pmd = pmd_offset(pud, addr); - if (pmd_none(*pmd)) - return false; - - if (pmd_bad(*pmd)) - return true; - pte = pte_offset_kernel(pmd, addr); - return !pte_none(*pte); -} - -static int __meminit kasan_mem_notifier(struct notifier_block *nb, - unsigned long action, void *data) -{ - struct memory_notify *mem_data = data; - unsigned long nr_shadow_pages, start_kaddr, shadow_start; - unsigned long shadow_end, shadow_size; - - nr_shadow_pages = mem_data->nr_pages >> KASAN_SHADOW_SCALE_SHIFT; - start_kaddr = (unsigned long)pfn_to_kaddr(mem_data->start_pfn); - shadow_start = (unsigned long)kasan_mem_to_shadow((void *)start_kaddr); - shadow_size = nr_shadow_pages << PAGE_SHIFT; - shadow_end = shadow_start + shadow_size; - - if (WARN_ON(mem_data->nr_pages % KASAN_GRANULE_SIZE) || - WARN_ON(start_kaddr % (KASAN_GRANULE_SIZE << PAGE_SHIFT))) - return NOTIFY_BAD; - - switch (action) { - case MEM_GOING_ONLINE: { - void *ret; - - /* - * If shadow is mapped already than it must have been mapped - * during the boot. This could happen if we onlining previously - * offlined memory. - */ - if (shadow_mapped(shadow_start)) - return NOTIFY_OK; - - ret = __vmalloc_node_range(shadow_size, PAGE_SIZE, shadow_start, - shadow_end, GFP_KERNEL, - PAGE_KERNEL, VM_NO_GUARD, - pfn_to_nid(mem_data->start_pfn), - __builtin_return_address(0)); - if (!ret) - return NOTIFY_BAD; - - kmemleak_ignore(ret); - return NOTIFY_OK; - } - case MEM_CANCEL_ONLINE: - case MEM_OFFLINE: { - struct vm_struct *vm; - - /* - * shadow_start was either mapped during boot by kasan_init() - * or during memory online by __vmalloc_node_range(). - * In the latter case we can use vfree() to free shadow. - * Non-NULL result of the find_vm_area() will tell us if - * that was the second case. - * - * Currently it's not possible to free shadow mapped - * during boot by kasan_init(). It's because the code - * to do that hasn't been written yet. So we'll just - * leak the memory. - */ - vm = find_vm_area((void *)shadow_start); - if (vm) - vfree((void *)shadow_start); - } - } - - return NOTIFY_OK; -} - -static int __init kasan_memhotplug_init(void) -{ - hotplug_memory_notifier(kasan_mem_notifier, 0); - - return 0; -} - -core_initcall(kasan_memhotplug_init); -#endif - -#ifdef CONFIG_KASAN_VMALLOC - -static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr, - void *unused) -{ - unsigned long page; - pte_t pte; - - if (likely(!pte_none(*ptep))) - return 0; - - page = __get_free_page(GFP_KERNEL); - if (!page) - return -ENOMEM; - - memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE); - pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL); - - spin_lock(&init_mm.page_table_lock); - if (likely(pte_none(*ptep))) { - set_pte_at(&init_mm, addr, ptep, pte); - page = 0; - } - spin_unlock(&init_mm.page_table_lock); - if (page) - free_page(page); - return 0; -} - -int kasan_populate_vmalloc(unsigned long addr, unsigned long size) -{ - unsigned long shadow_start, shadow_end; - int ret; - - if (!is_vmalloc_or_module_addr((void *)addr)) - return 0; - - shadow_start = (unsigned long)kasan_mem_to_shadow((void *)addr); - shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE); - shadow_end = (unsigned long)kasan_mem_to_shadow((void *)addr + size); - shadow_end = ALIGN(shadow_end, PAGE_SIZE); - - ret = apply_to_page_range(&init_mm, shadow_start, - shadow_end - shadow_start, - kasan_populate_vmalloc_pte, NULL); - if (ret) - return ret; - - flush_cache_vmap(shadow_start, shadow_end); - - /* - * We need to be careful about inter-cpu effects here. Consider: - * - * CPU#0 CPU#1 - * WRITE_ONCE(p, vmalloc(100)); while (x = READ_ONCE(p)) ; - * p[99] = 1; - * - * With compiler instrumentation, that ends up looking like this: - * - * CPU#0 CPU#1 - * // vmalloc() allocates memory - * // let a = area->addr - * // we reach kasan_populate_vmalloc - * // and call kasan_unpoison_memory: - * STORE shadow(a), unpoison_val - * ... - * STORE shadow(a+99), unpoison_val x = LOAD p - * // rest of vmalloc process - * STORE p, a LOAD shadow(x+99) - * - * If there is no barrier between the end of unpoisioning the shadow - * and the store of the result to p, the stores could be committed - * in a different order by CPU#0, and CPU#1 could erroneously observe - * poison in the shadow. - * - * We need some sort of barrier between the stores. - * - * In the vmalloc() case, this is provided by a smp_wmb() in - * clear_vm_uninitialized_flag(). In the per-cpu allocator and in - * get_vm_area() and friends, the caller gets shadow allocated but - * doesn't have any pages mapped into the virtual address space that - * has been reserved. Mapping those pages in will involve taking and - * releasing a page-table lock, which will provide the barrier. - */ - - return 0; -} - -/* - * Poison the shadow for a vmalloc region. Called as part of the - * freeing process at the time the region is freed. - */ -void kasan_poison_vmalloc(const void *start, unsigned long size) -{ - if (!is_vmalloc_or_module_addr(start)) - return; - - size = round_up(size, KASAN_GRANULE_SIZE); - kasan_poison_memory(start, size, KASAN_VMALLOC_INVALID); -} - -void kasan_unpoison_vmalloc(const void *start, unsigned long size) -{ - if (!is_vmalloc_or_module_addr(start)) - return; - - kasan_unpoison_memory(start, size); -} - -static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr, - void *unused) -{ - unsigned long page; - - page = (unsigned long)__va(pte_pfn(*ptep) << PAGE_SHIFT); - - spin_lock(&init_mm.page_table_lock); - - if (likely(!pte_none(*ptep))) { - pte_clear(&init_mm, addr, ptep); - free_page(page); - } - spin_unlock(&init_mm.page_table_lock); - - return 0; -} - -/* - * Release the backing for the vmalloc region [start, end), which - * lies within the free region [free_region_start, free_region_end). - * - * This can be run lazily, long after the region was freed. It runs - * under vmap_area_lock, so it's not safe to interact with the vmalloc/vmap - * infrastructure. - * - * How does this work? - * ------------------- - * - * We have a region that is page aligned, labelled as A. - * That might not map onto the shadow in a way that is page-aligned: - * - * start end - * v v - * |????????|????????|AAAAAAAA|AA....AA|AAAAAAAA|????????| < vmalloc - * -------- -------- -------- -------- -------- - * | | | | | - * | | | /-------/ | - * \-------\|/------/ |/---------------/ - * ||| || - * |??AAAAAA|AAAAAAAA|AA??????| < shadow - * (1) (2) (3) - * - * First we align the start upwards and the end downwards, so that the - * shadow of the region aligns with shadow page boundaries. In the - * example, this gives us the shadow page (2). This is the shadow entirely - * covered by this allocation. - * - * Then we have the tricky bits. We want to know if we can free the - * partially covered shadow pages - (1) and (3) in the example. For this, - * we are given the start and end of the free region that contains this - * allocation. Extending our previous example, we could have: - * - * free_region_start free_region_end - * | start end | - * v v v v - * |FFFFFFFF|FFFFFFFF|AAAAAAAA|AA....AA|AAAAAAAA|FFFFFFFF| < vmalloc - * -------- -------- -------- -------- -------- - * | | | | | - * | | | /-------/ | - * \-------\|/------/ |/---------------/ - * ||| || - * |FFAAAAAA|AAAAAAAA|AAF?????| < shadow - * (1) (2) (3) - * - * Once again, we align the start of the free region up, and the end of - * the free region down so that the shadow is page aligned. So we can free - * page (1) - we know no allocation currently uses anything in that page, - * because all of it is in the vmalloc free region. But we cannot free - * page (3), because we can't be sure that the rest of it is unused. - * - * We only consider pages that contain part of the original region for - * freeing: we don't try to free other pages from the free region or we'd - * end up trying to free huge chunks of virtual address space. - * - * Concurrency - * ----------- - * - * How do we know that we're not freeing a page that is simultaneously - * being used for a fresh allocation in kasan_populate_vmalloc(_pte)? - * - * We _can_ have kasan_release_vmalloc and kasan_populate_vmalloc running - * at the same time. While we run under free_vmap_area_lock, the population - * code does not. - * - * free_vmap_area_lock instead operates to ensure that the larger range - * [free_region_start, free_region_end) is safe: because __alloc_vmap_area and - * the per-cpu region-finding algorithm both run under free_vmap_area_lock, - * no space identified as free will become used while we are running. This - * means that so long as we are careful with alignment and only free shadow - * pages entirely covered by the free region, we will not run in to any - * trouble - any simultaneous allocations will be for disjoint regions. - */ -void kasan_release_vmalloc(unsigned long start, unsigned long end, - unsigned long free_region_start, - unsigned long free_region_end) -{ - void *shadow_start, *shadow_end; - unsigned long region_start, region_end; - unsigned long size; - - region_start = ALIGN(start, PAGE_SIZE * KASAN_GRANULE_SIZE); - region_end = ALIGN_DOWN(end, PAGE_SIZE * KASAN_GRANULE_SIZE); - - free_region_start = ALIGN(free_region_start, - PAGE_SIZE * KASAN_GRANULE_SIZE); - - if (start != region_start && - free_region_start < region_start) - region_start -= PAGE_SIZE * KASAN_GRANULE_SIZE; - - free_region_end = ALIGN_DOWN(free_region_end, - PAGE_SIZE * KASAN_GRANULE_SIZE); - - if (end != region_end && - free_region_end > region_end) - region_end += PAGE_SIZE * KASAN_GRANULE_SIZE; - - shadow_start = kasan_mem_to_shadow((void *)region_start); - shadow_end = kasan_mem_to_shadow((void *)region_end); - - if (shadow_end > shadow_start) { - size = shadow_end - shadow_start; - apply_to_existing_page_range(&init_mm, - (unsigned long)shadow_start, - size, kasan_depopulate_vmalloc_pte, - NULL); - flush_tlb_kernel_range((unsigned long)shadow_start, - (unsigned long)shadow_end); - } -} - -#else /* CONFIG_KASAN_VMALLOC */ - -int kasan_module_alloc(void *addr, size_t size) -{ - void *ret; - size_t scaled_size; - size_t shadow_size; - unsigned long shadow_start; - - shadow_start = (unsigned long)kasan_mem_to_shadow(addr); - scaled_size = (size + KASAN_GRANULE_SIZE - 1) >> - KASAN_SHADOW_SCALE_SHIFT; - shadow_size = round_up(scaled_size, PAGE_SIZE); - - if (WARN_ON(!PAGE_ALIGNED(shadow_start))) - return -EINVAL; - - ret = __vmalloc_node_range(shadow_size, 1, shadow_start, - shadow_start + shadow_size, - GFP_KERNEL, - PAGE_KERNEL, VM_NO_GUARD, NUMA_NO_NODE, - __builtin_return_address(0)); - - if (ret) { - __memset(ret, KASAN_SHADOW_INIT, shadow_size); - find_vm_area(addr)->flags |= VM_KASAN; - kmemleak_ignore(ret); - return 0; - } - - return -ENOMEM; -} - -void kasan_free_shadow(const struct vm_struct *vm) -{ - if (vm->flags & VM_KASAN) - vfree(kasan_mem_to_shadow(vm->addr)); -} - -#endif diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c new file mode 100644 index 000000000000..4888084ecdfc --- /dev/null +++ b/mm/kasan/shadow.c @@ -0,0 +1,509 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This file contains KASAN shadow runtime code. + * + * Copyright (c) 2014 Samsung Electronics Co., Ltd. + * Author: Andrey Ryabinin + * + * Some code borrowed from https://github.com/xairy/kasan-prototype by + * Andrey Konovalov + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "kasan.h" + +bool __kasan_check_read(const volatile void *p, unsigned int size) +{ + return check_memory_region((unsigned long)p, size, false, _RET_IP_); +} +EXPORT_SYMBOL(__kasan_check_read); + +bool __kasan_check_write(const volatile void *p, unsigned int size) +{ + return check_memory_region((unsigned long)p, size, true, _RET_IP_); +} +EXPORT_SYMBOL(__kasan_check_write); + +#undef memset +void *memset(void *addr, int c, size_t len) +{ + if (!check_memory_region((unsigned long)addr, len, true, _RET_IP_)) + return NULL; + + return __memset(addr, c, len); +} + +#ifdef __HAVE_ARCH_MEMMOVE +#undef memmove +void *memmove(void *dest, const void *src, size_t len) +{ + if (!check_memory_region((unsigned long)src, len, false, _RET_IP_) || + !check_memory_region((unsigned long)dest, len, true, _RET_IP_)) + return NULL; + + return __memmove(dest, src, len); +} +#endif + +#undef memcpy +void *memcpy(void *dest, const void *src, size_t len) +{ + if (!check_memory_region((unsigned long)src, len, false, _RET_IP_) || + !check_memory_region((unsigned long)dest, len, true, _RET_IP_)) + return NULL; + + return __memcpy(dest, src, len); +} + +/* + * Poisons the shadow memory for 'size' bytes starting from 'addr'. + * Memory addresses should be aligned to KASAN_GRANULE_SIZE. + */ +void kasan_poison_memory(const void *address, size_t size, u8 value) +{ + void *shadow_start, *shadow_end; + + /* + * Perform shadow offset calculation based on untagged address, as + * some of the callers (e.g. kasan_poison_object_data) pass tagged + * addresses to this function. + */ + address = reset_tag(address); + + shadow_start = kasan_mem_to_shadow(address); + shadow_end = kasan_mem_to_shadow(address + size); + + __memset(shadow_start, value, shadow_end - shadow_start); +} + +void kasan_unpoison_memory(const void *address, size_t size) +{ + u8 tag = get_tag(address); + + /* + * Perform shadow offset calculation based on untagged address, as + * some of the callers (e.g. kasan_unpoison_object_data) pass tagged + * addresses to this function. + */ + address = reset_tag(address); + + kasan_poison_memory(address, size, tag); + + if (size & KASAN_GRANULE_MASK) { + u8 *shadow = (u8 *)kasan_mem_to_shadow(address + size); + + if (IS_ENABLED(CONFIG_KASAN_SW_TAGS)) + *shadow = tag; + else + *shadow = size & KASAN_GRANULE_MASK; + } +} + +#ifdef CONFIG_MEMORY_HOTPLUG +static bool shadow_mapped(unsigned long addr) +{ + pgd_t *pgd = pgd_offset_k(addr); + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + if (pgd_none(*pgd)) + return false; + p4d = p4d_offset(pgd, addr); + if (p4d_none(*p4d)) + return false; + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) + return false; + + /* + * We can't use pud_large() or pud_huge(), the first one is + * arch-specific, the last one depends on HUGETLB_PAGE. So let's abuse + * pud_bad(), if pud is bad then it's bad because it's huge. + */ + if (pud_bad(*pud)) + return true; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return false; + + if (pmd_bad(*pmd)) + return true; + pte = pte_offset_kernel(pmd, addr); + return !pte_none(*pte); +} + +static int __meminit kasan_mem_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct memory_notify *mem_data = data; + unsigned long nr_shadow_pages, start_kaddr, shadow_start; + unsigned long shadow_end, shadow_size; + + nr_shadow_pages = mem_data->nr_pages >> KASAN_SHADOW_SCALE_SHIFT; + start_kaddr = (unsigned long)pfn_to_kaddr(mem_data->start_pfn); + shadow_start = (unsigned long)kasan_mem_to_shadow((void *)start_kaddr); + shadow_size = nr_shadow_pages << PAGE_SHIFT; + shadow_end = shadow_start + shadow_size; + + if (WARN_ON(mem_data->nr_pages % KASAN_GRANULE_SIZE) || + WARN_ON(start_kaddr % (KASAN_GRANULE_SIZE << PAGE_SHIFT))) + return NOTIFY_BAD; + + switch (action) { + case MEM_GOING_ONLINE: { + void *ret; + + /* + * If shadow is mapped already than it must have been mapped + * during the boot. This could happen if we onlining previously + * offlined memory. + */ + if (shadow_mapped(shadow_start)) + return NOTIFY_OK; + + ret = __vmalloc_node_range(shadow_size, PAGE_SIZE, shadow_start, + shadow_end, GFP_KERNEL, + PAGE_KERNEL, VM_NO_GUARD, + pfn_to_nid(mem_data->start_pfn), + __builtin_return_address(0)); + if (!ret) + return NOTIFY_BAD; + + kmemleak_ignore(ret); + return NOTIFY_OK; + } + case MEM_CANCEL_ONLINE: + case MEM_OFFLINE: { + struct vm_struct *vm; + + /* + * shadow_start was either mapped during boot by kasan_init() + * or during memory online by __vmalloc_node_range(). + * In the latter case we can use vfree() to free shadow. + * Non-NULL result of the find_vm_area() will tell us if + * that was the second case. + * + * Currently it's not possible to free shadow mapped + * during boot by kasan_init(). It's because the code + * to do that hasn't been written yet. So we'll just + * leak the memory. + */ + vm = find_vm_area((void *)shadow_start); + if (vm) + vfree((void *)shadow_start); + } + } + + return NOTIFY_OK; +} + +static int __init kasan_memhotplug_init(void) +{ + hotplug_memory_notifier(kasan_mem_notifier, 0); + + return 0; +} + +core_initcall(kasan_memhotplug_init); +#endif + +#ifdef CONFIG_KASAN_VMALLOC + +static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr, + void *unused) +{ + unsigned long page; + pte_t pte; + + if (likely(!pte_none(*ptep))) + return 0; + + page = __get_free_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + + memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE); + pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL); + + spin_lock(&init_mm.page_table_lock); + if (likely(pte_none(*ptep))) { + set_pte_at(&init_mm, addr, ptep, pte); + page = 0; + } + spin_unlock(&init_mm.page_table_lock); + if (page) + free_page(page); + return 0; +} + +int kasan_populate_vmalloc(unsigned long addr, unsigned long size) +{ + unsigned long shadow_start, shadow_end; + int ret; + + if (!is_vmalloc_or_module_addr((void *)addr)) + return 0; + + shadow_start = (unsigned long)kasan_mem_to_shadow((void *)addr); + shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE); + shadow_end = (unsigned long)kasan_mem_to_shadow((void *)addr + size); + shadow_end = ALIGN(shadow_end, PAGE_SIZE); + + ret = apply_to_page_range(&init_mm, shadow_start, + shadow_end - shadow_start, + kasan_populate_vmalloc_pte, NULL); + if (ret) + return ret; + + flush_cache_vmap(shadow_start, shadow_end); + + /* + * We need to be careful about inter-cpu effects here. Consider: + * + * CPU#0 CPU#1 + * WRITE_ONCE(p, vmalloc(100)); while (x = READ_ONCE(p)) ; + * p[99] = 1; + * + * With compiler instrumentation, that ends up looking like this: + * + * CPU#0 CPU#1 + * // vmalloc() allocates memory + * // let a = area->addr + * // we reach kasan_populate_vmalloc + * // and call kasan_unpoison_memory: + * STORE shadow(a), unpoison_val + * ... + * STORE shadow(a+99), unpoison_val x = LOAD p + * // rest of vmalloc process + * STORE p, a LOAD shadow(x+99) + * + * If there is no barrier between the end of unpoisioning the shadow + * and the store of the result to p, the stores could be committed + * in a different order by CPU#0, and CPU#1 could erroneously observe + * poison in the shadow. + * + * We need some sort of barrier between the stores. + * + * In the vmalloc() case, this is provided by a smp_wmb() in + * clear_vm_uninitialized_flag(). In the per-cpu allocator and in + * get_vm_area() and friends, the caller gets shadow allocated but + * doesn't have any pages mapped into the virtual address space that + * has been reserved. Mapping those pages in will involve taking and + * releasing a page-table lock, which will provide the barrier. + */ + + return 0; +} + +/* + * Poison the shadow for a vmalloc region. Called as part of the + * freeing process at the time the region is freed. + */ +void kasan_poison_vmalloc(const void *start, unsigned long size) +{ + if (!is_vmalloc_or_module_addr(start)) + return; + + size = round_up(size, KASAN_GRANULE_SIZE); + kasan_poison_memory(start, size, KASAN_VMALLOC_INVALID); +} + +void kasan_unpoison_vmalloc(const void *start, unsigned long size) +{ + if (!is_vmalloc_or_module_addr(start)) + return; + + kasan_unpoison_memory(start, size); +} + +static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr, + void *unused) +{ + unsigned long page; + + page = (unsigned long)__va(pte_pfn(*ptep) << PAGE_SHIFT); + + spin_lock(&init_mm.page_table_lock); + + if (likely(!pte_none(*ptep))) { + pte_clear(&init_mm, addr, ptep); + free_page(page); + } + spin_unlock(&init_mm.page_table_lock); + + return 0; +} + +/* + * Release the backing for the vmalloc region [start, end), which + * lies within the free region [free_region_start, free_region_end). + * + * This can be run lazily, long after the region was freed. It runs + * under vmap_area_lock, so it's not safe to interact with the vmalloc/vmap + * infrastructure. + * + * How does this work? + * ------------------- + * + * We have a region that is page aligned, labelled as A. + * That might not map onto the shadow in a way that is page-aligned: + * + * start end + * v v + * |????????|????????|AAAAAAAA|AA....AA|AAAAAAAA|????????| < vmalloc + * -------- -------- -------- -------- -------- + * | | | | | + * | | | /-------/ | + * \-------\|/------/ |/---------------/ + * ||| || + * |??AAAAAA|AAAAAAAA|AA??????| < shadow + * (1) (2) (3) + * + * First we align the start upwards and the end downwards, so that the + * shadow of the region aligns with shadow page boundaries. In the + * example, this gives us the shadow page (2). This is the shadow entirely + * covered by this allocation. + * + * Then we have the tricky bits. We want to know if we can free the + * partially covered shadow pages - (1) and (3) in the example. For this, + * we are given the start and end of the free region that contains this + * allocation. Extending our previous example, we could have: + * + * free_region_start free_region_end + * | start end | + * v v v v + * |FFFFFFFF|FFFFFFFF|AAAAAAAA|AA....AA|AAAAAAAA|FFFFFFFF| < vmalloc + * -------- -------- -------- -------- -------- + * | | | | | + * | | | /-------/ | + * \-------\|/------/ |/---------------/ + * ||| || + * |FFAAAAAA|AAAAAAAA|AAF?????| < shadow + * (1) (2) (3) + * + * Once again, we align the start of the free region up, and the end of + * the free region down so that the shadow is page aligned. So we can free + * page (1) - we know no allocation currently uses anything in that page, + * because all of it is in the vmalloc free region. But we cannot free + * page (3), because we can't be sure that the rest of it is unused. + * + * We only consider pages that contain part of the original region for + * freeing: we don't try to free other pages from the free region or we'd + * end up trying to free huge chunks of virtual address space. + * + * Concurrency + * ----------- + * + * How do we know that we're not freeing a page that is simultaneously + * being used for a fresh allocation in kasan_populate_vmalloc(_pte)? + * + * We _can_ have kasan_release_vmalloc and kasan_populate_vmalloc running + * at the same time. While we run under free_vmap_area_lock, the population + * code does not. + * + * free_vmap_area_lock instead operates to ensure that the larger range + * [free_region_start, free_region_end) is safe: because __alloc_vmap_area and + * the per-cpu region-finding algorithm both run under free_vmap_area_lock, + * no space identified as free will become used while we are running. This + * means that so long as we are careful with alignment and only free shadow + * pages entirely covered by the free region, we will not run in to any + * trouble - any simultaneous allocations will be for disjoint regions. + */ +void kasan_release_vmalloc(unsigned long start, unsigned long end, + unsigned long free_region_start, + unsigned long free_region_end) +{ + void *shadow_start, *shadow_end; + unsigned long region_start, region_end; + unsigned long size; + + region_start = ALIGN(start, PAGE_SIZE * KASAN_GRANULE_SIZE); + region_end = ALIGN_DOWN(end, PAGE_SIZE * KASAN_GRANULE_SIZE); + + free_region_start = ALIGN(free_region_start, + PAGE_SIZE * KASAN_GRANULE_SIZE); + + if (start != region_start && + free_region_start < region_start) + region_start -= PAGE_SIZE * KASAN_GRANULE_SIZE; + + free_region_end = ALIGN_DOWN(free_region_end, + PAGE_SIZE * KASAN_GRANULE_SIZE); + + if (end != region_end && + free_region_end > region_end) + region_end += PAGE_SIZE * KASAN_GRANULE_SIZE; + + shadow_start = kasan_mem_to_shadow((void *)region_start); + shadow_end = kasan_mem_to_shadow((void *)region_end); + + if (shadow_end > shadow_start) { + size = shadow_end - shadow_start; + apply_to_existing_page_range(&init_mm, + (unsigned long)shadow_start, + size, kasan_depopulate_vmalloc_pte, + NULL); + flush_tlb_kernel_range((unsigned long)shadow_start, + (unsigned long)shadow_end); + } +} + +#else /* CONFIG_KASAN_VMALLOC */ + +int kasan_module_alloc(void *addr, size_t size) +{ + void *ret; + size_t scaled_size; + size_t shadow_size; + unsigned long shadow_start; + + shadow_start = (unsigned long)kasan_mem_to_shadow(addr); + scaled_size = (size + KASAN_GRANULE_SIZE - 1) >> + KASAN_SHADOW_SCALE_SHIFT; + shadow_size = round_up(scaled_size, PAGE_SIZE); + + if (WARN_ON(!PAGE_ALIGNED(shadow_start))) + return -EINVAL; + + ret = __vmalloc_node_range(shadow_size, 1, shadow_start, + shadow_start + shadow_size, + GFP_KERNEL, + PAGE_KERNEL, VM_NO_GUARD, NUMA_NO_NODE, + __builtin_return_address(0)); + + if (ret) { + __memset(ret, KASAN_SHADOW_INIT, shadow_size); + find_vm_area(addr)->flags |= VM_KASAN; + kmemleak_ignore(ret); + return 0; + } + + return -ENOMEM; +} + +void kasan_free_shadow(const struct vm_struct *vm) +{ + if (vm->flags & VM_KASAN) + vfree(kasan_mem_to_shadow(vm->addr)); +} + +#endif