From patchwork Tue Mar 21 20:54:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13183268 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70893C6FD1D for ; Tue, 21 Mar 2023 20:54:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230075AbjCUUys (ORCPT ); Tue, 21 Mar 2023 16:54:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230035AbjCUUym (ORCPT ); Tue, 21 Mar 2023 16:54:42 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0818E53D9C; Tue, 21 Mar 2023 13:54:41 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id l27so6733579wrb.2; Tue, 21 Mar 2023 13:54:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679432079; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cY5q38znnygXEk7MQGfhMNT6fWhXSJdRMwyHM9vnLSQ=; b=WXv44Su+m8P4kXKAgctFKpHknepkQ3/lIhDf1dfJsDViOOre129Y7j4Ck5+3x/5pBr jwFY+17jeebTUBg5beyTQI2XmhJvQ9Lb/RmWt+lLNoLw452/VnjezC9HUTAmrKNIHSkt zmTlY/i5G76kkuraE93tLzBXqrKmvaZH5v3blsuow9zYdrEJqT1j8BcpstGTVd2SN9WX DKtYV7pDHdu/3Q9W61A8o5izTsKHNJLCFSvFn2Wfj1fiEBqfApZ2aXKNHl0JSHvH+ENz YpVn5gFll/rCOaUyZF0/TyPtZz2fF3YgDxkSdLKqRmt4RlKE2dRB9uT0EOH/MK7WKOdj 2qTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679432079; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cY5q38znnygXEk7MQGfhMNT6fWhXSJdRMwyHM9vnLSQ=; b=JqCKgfsQbqxUiHQq/7tzM2z86xO1QgIDke/eH7JRdZflk1UmaYvApQUL+Tf4GNe0b4 OHlOBTDrBdw0NmY4ceTvkX/4pUg1m1fgzh+fVXaEoV0lG6j4Kw4lG54LOtkIAh16/7Kh GuChqSj1W+Pm6Y2zjdbW9+4hEki9N47pCUAmSNcAnP14x/+lFn01eeltPI9onlgTrQmW EnX3iVXFaBgFxcZjhP4hXeDotE7ENwRIQwkq4f9j783B5sbcxVogD9Aq1DBMhOxyfDXF BerKMcEi2zTWQSOsT5Vcy8vjb1KDHiSkOOa5n3Ym24w9ARDY0szSO27MAE3wNSIOPiP1 cUrg== X-Gm-Message-State: AO0yUKXPVkA1gjsjxgCzPkbAlj0Ux5gbnhGz76T4WtELyLUw1V6yB/ug s3Asb7Ef/mxAcWdtUUdCagamcq1CXsY= X-Google-Smtp-Source: AK7set/ERpVemd/vLRdCwAkIy45+Hl+RuG9U/wayryoIiE+n2Qpze3L+1Sf3hOQVwJz0prOr7IANIA== X-Received: by 2002:adf:e481:0:b0:2d1:f705:a602 with SMTP id i1-20020adfe481000000b002d1f705a602mr3193312wrm.22.1679432079310; Tue, 21 Mar 2023 13:54:39 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id a8-20020a056000100800b002d8566128e5sm3744575wrx.25.2023.03.21.13.54.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 13:54:38 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v4 1/4] fs/proc/kcore: avoid bounce buffer for ktext data Date: Tue, 21 Mar 2023 20:54:30 +0000 Message-Id: <08f9787b1fd0d552b65c62547f5382d5a5c7dbe4.1679431886.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data") introduced the use of a bounce buffer to retrieve kernel text data for /proc/kcore in order to avoid failures arising from hardened user copies enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object(). We can avoid doing this if instead of copy_to_user() we use _copy_to_user() which bypasses the hardening check. This is more efficient than using a bounce buffer and simplifies the code. We do so as part an overall effort to eliminate bounce buffer usage in the function with an eye to converting it an iterator read. Signed-off-by: Lorenzo Stoakes Reviewed-by: David Hildenbrand --- fs/proc/kcore.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 71157ee35c1a..556f310d6aa4 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -541,19 +541,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) case KCORE_VMEMMAP: case KCORE_TEXT: /* - * Using bounce buffer to bypass the - * hardened user copy kernel text checks. + * We use _copy_to_user() to bypass usermode hardening + * which would otherwise prevent this operation. */ - if (copy_from_kernel_nofault(buf, (void *)start, tsz)) { - if (clear_user(buffer, tsz)) { - ret = -EFAULT; - goto out; - } - } else { - if (copy_to_user(buffer, buf, tsz)) { - ret = -EFAULT; - goto out; - } + if (_copy_to_user(buffer, (char *)start, tsz)) { + ret = -EFAULT; + goto out; } break; default: From patchwork Tue Mar 21 20:54:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13183269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A95A0C6FD1D for ; Tue, 21 Mar 2023 20:54:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230090AbjCUUyt (ORCPT ); Tue, 21 Mar 2023 16:54:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229639AbjCUUym (ORCPT ); Tue, 21 Mar 2023 16:54:42 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B0CC53DA2; Tue, 21 Mar 2023 13:54:41 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id l27so6733627wrb.2; Tue, 21 Mar 2023 13:54:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679432081; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ML/fjqBQiFdniaLp2U4iIoVxIHX7dLvD1UO1EbWAfgg=; b=YTjVAh5UfzB+CsNEwTjIfb2sXKfTJ9cL7bqTAxOe7qKhN/upK8iC1PWM4DgivCzm05 ttdfRorxxCQpI8/l/gjPXvwGkms6pAF/al5MUxDX9d29ZwadLfLdqvcgFWxJRQlmoPU0 4CQjzyHq9YLOAM/rJpnucme9REBoZcWXxJahEAgd5cgkouFCER0RSVQY7R6X0jPpWQS5 kwz6QwVxOqeLv2eeBoVOR31iuvqP++gLvSyZ6i9Neca6kI2ufsLY60u+lPSTF9Drw/0B y/A0h4kUaYY421T28LPml4lEv3IvBC9TEoU3UqAi5t4S+TH22FkUnGAGXYjuGAsOojMf UF1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679432081; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ML/fjqBQiFdniaLp2U4iIoVxIHX7dLvD1UO1EbWAfgg=; b=2/9kLv2VzGUavtLs7gNI53cdxs4XZ5x1wp6ebAGOv8VaUr/IyV8JPSeO50eS5C770J Zp/s0+ZCz7cX4aOw323WjA0S7LY2yrXro9yMfpksS3QolrgUjQOYrFhkeP5whG3PlINx Y7mx8Wse/d2wPRoFRpJRkmiCkbB3y7NXeCKD6n9m3lbd312Y3/DfOBw0PzohBbbD5BX1 DiuTgerJse+OlCeB2V523VWr3sT7MZAFT1tIYwI8uscHEw0QmpTCUWiJgqFx51+BxPwa vol3RaModyhItpc0eayxV+opteexN7tOzgSMmCK3G2+lU00VsOv2eenIJ7YaSL6LCC2p jTgQ== X-Gm-Message-State: AO0yUKXkMAxkU81shL+jE9GNW/fVmP6kN8OvLPEohN0DPxJAM9BMORDn dsrM7LtNdWnxg6FPuhj5CpE= X-Google-Smtp-Source: AK7set8MSnjFXlCZD0uN53pKmP4le+esx+7d2fMHwFpAur07bL4JB0RzcgXxWJwNwVXEk2Q3nrp++g== X-Received: by 2002:adf:e682:0:b0:2d8:1aff:813 with SMTP id r2-20020adfe682000000b002d81aff0813mr3718845wrm.43.1679432080699; Tue, 21 Mar 2023 13:54:40 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id a8-20020a056000100800b002d8566128e5sm3744575wrx.25.2023.03.21.13.54.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 13:54:39 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v4 2/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter() Date: Tue, 21 Mar 2023 20:54:31 +0000 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Now we have eliminated spinlocks from the vread() case, convert read_kcore() to read_kcore_iter(). For the time being we still use a bounce buffer for vread(), however in the next patch we will convert this to interact directly with the iterator and eliminate the bounce buffer altogether. Signed-off-by: Lorenzo Stoakes Reviewed-by: David Hildenbrand --- fs/proc/kcore.c | 58 ++++++++++++++++++++++++------------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 556f310d6aa4..25e0eeb8d498 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include #include #include @@ -308,9 +308,12 @@ static void append_kcore_note(char *notes, size_t *i, const char *name, } static ssize_t -read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) +read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) { + struct file *file = iocb->ki_filp; char *buf = file->private_data; + loff_t *ppos = &iocb->ki_pos; + size_t phdrs_offset, notes_offset, data_offset; size_t page_offline_frozen = 1; size_t phdrs_len, notes_len; @@ -318,6 +321,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) size_t tsz; int nphdr; unsigned long start; + size_t buflen = iov_iter_count(iter); size_t orig_buflen = buflen; int ret = 0; @@ -333,7 +337,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) notes_offset = phdrs_offset + phdrs_len; /* ELF file header. */ - if (buflen && *fpos < sizeof(struct elfhdr)) { + if (buflen && *ppos < sizeof(struct elfhdr)) { struct elfhdr ehdr = { .e_ident = { [EI_MAG0] = ELFMAG0, @@ -355,19 +359,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) .e_phnum = nphdr, }; - tsz = min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos); - if (copy_to_user(buffer, (char *)&ehdr + *fpos, tsz)) { + tsz = min_t(size_t, buflen, sizeof(struct elfhdr) - *ppos); + if (copy_to_iter((char *)&ehdr + *ppos, tsz, iter) != tsz) { ret = -EFAULT; goto out; } - buffer += tsz; buflen -= tsz; - *fpos += tsz; + *ppos += tsz; } /* ELF program headers. */ - if (buflen && *fpos < phdrs_offset + phdrs_len) { + if (buflen && *ppos < phdrs_offset + phdrs_len) { struct elf_phdr *phdrs, *phdr; phdrs = kzalloc(phdrs_len, GFP_KERNEL); @@ -397,22 +400,21 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) phdr++; } - tsz = min_t(size_t, buflen, phdrs_offset + phdrs_len - *fpos); - if (copy_to_user(buffer, (char *)phdrs + *fpos - phdrs_offset, - tsz)) { + tsz = min_t(size_t, buflen, phdrs_offset + phdrs_len - *ppos); + if (copy_to_iter((char *)phdrs + *ppos - phdrs_offset, tsz, + iter) != tsz) { kfree(phdrs); ret = -EFAULT; goto out; } kfree(phdrs); - buffer += tsz; buflen -= tsz; - *fpos += tsz; + *ppos += tsz; } /* ELF note segment. */ - if (buflen && *fpos < notes_offset + notes_len) { + if (buflen && *ppos < notes_offset + notes_len) { struct elf_prstatus prstatus = {}; struct elf_prpsinfo prpsinfo = { .pr_sname = 'R', @@ -447,24 +449,23 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) vmcoreinfo_data, min(vmcoreinfo_size, notes_len - i)); - tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos); - if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) { + tsz = min_t(size_t, buflen, notes_offset + notes_len - *ppos); + if (copy_to_iter(notes + *ppos - notes_offset, tsz, iter) != tsz) { kfree(notes); ret = -EFAULT; goto out; } kfree(notes); - buffer += tsz; buflen -= tsz; - *fpos += tsz; + *ppos += tsz; } /* * Check to see if our file offset matches with any of * the addresses in the elf_phdr on our list. */ - start = kc_offset_to_vaddr(*fpos - data_offset); + start = kc_offset_to_vaddr(*ppos - data_offset); if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) tsz = buflen; @@ -497,7 +498,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) } if (!m) { - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) != tsz) { ret = -EFAULT; goto out; } @@ -508,14 +509,14 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) case KCORE_VMALLOC: vread(buf, (char *)start, tsz); /* we have to zero-fill user buffer even if no read */ - if (copy_to_user(buffer, buf, tsz)) { + if (copy_to_iter(buf, tsz, iter) != tsz) { ret = -EFAULT; goto out; } break; case KCORE_USER: /* User page is handled prior to normal kernel page: */ - if (copy_to_user(buffer, (char *)start, tsz)) { + if (copy_to_iter((char *)start, tsz, iter) != tsz) { ret = -EFAULT; goto out; } @@ -531,7 +532,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) */ if (!page || PageOffline(page) || is_page_hwpoison(page) || !pfn_is_ram(pfn)) { - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) != tsz) { ret = -EFAULT; goto out; } @@ -541,25 +542,24 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) case KCORE_VMEMMAP: case KCORE_TEXT: /* - * We use _copy_to_user() to bypass usermode hardening + * We use _copy_to_iter() to bypass usermode hardening * which would otherwise prevent this operation. */ - if (_copy_to_user(buffer, (char *)start, tsz)) { + if (_copy_to_iter((char *)start, tsz, iter) != tsz) { ret = -EFAULT; goto out; } break; default: pr_warn_once("Unhandled KCORE type: %d\n", m->type); - if (clear_user(buffer, tsz)) { + if (iov_iter_zero(tsz, iter) != tsz) { ret = -EFAULT; goto out; } } skip: buflen -= tsz; - *fpos += tsz; - buffer += tsz; + *ppos += tsz; start += tsz; tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen); } @@ -603,7 +603,7 @@ static int release_kcore(struct inode *inode, struct file *file) } static const struct proc_ops kcore_proc_ops = { - .proc_read = read_kcore, + .proc_read_iter = read_kcore_iter, .proc_open = open_kcore, .proc_release = release_kcore, .proc_lseek = default_llseek, From patchwork Tue Mar 21 20:54:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13183270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB05DC761A6 for ; Tue, 21 Mar 2023 20:54:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230115AbjCUUyv (ORCPT ); Tue, 21 Mar 2023 16:54:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229734AbjCUUyp (ORCPT ); Tue, 21 Mar 2023 16:54:45 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD0E5559E6; Tue, 21 Mar 2023 13:54:43 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id r29so15025234wra.13; Tue, 21 Mar 2023 13:54:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679432082; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=b4AaWDqV89kHBZGOLd4BouvpuEGkoHHTR7oX6vXaXkQ=; b=DtrM/GAKL+afDECOYccj2A5NhbOt6QK4UCjV13FFWN1N4jdaBAT7gMxRmDEF1VBl0P q3bv04VT2AAKYj3fgVyGX+Bgl1DxuKhi2zmCw43Wrs0lCpWjbH64Spf8AoWlaPVMN/v/ oAhA+kOXLwYCnp/iFu8s+4yz1zzFzZ2ooIjv9OsNw2iC1z23zCT8m8r2AxgC6o1idz71 eZD3+7KhlXMM3oNj2pL+K/eVyfuLeAoQLKL7AyCM9Qq+cSBR2dA4BjZdpGTHKfh2lRNz TLHUqXEDW+pMxZc41EOW+Mj0c3BRWjubPngA3fKrogTi6IMSk6P29qijajn8rQOmbvO0 5weQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679432082; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b4AaWDqV89kHBZGOLd4BouvpuEGkoHHTR7oX6vXaXkQ=; b=Sm4J/2pXBvf85YcHnE0x4WQ+aQG017NRXdAbbq3I2YnNRCpW+nbYzi9z0VQgy+m5Dj 7+xazX2PsZbwbNt8gdHOFSx3jsBX7I1g1K6ljiEkKxlYUC6jqCxrIdb7SRlu2OTro9MB cR7lt+f7L9+ENEPb6PoHSqoEj+zo0qfma6q6uzo/xuH3THhzCQDd4jrunkEcMxcvXbrn qY7KDm+5rFlem/askF44FEUrzZPIGk0Ubp4cXEGsFxUscdlUlMNSBIUMHiNxj5JaZZdO Y1dOpAOUBA8Ti9wgx9ngRvcriulHbLZg19ja2e1nFwrNWy40AxZ05Ei+nH7ePMc7w3Iy u2jg== X-Gm-Message-State: AO0yUKXuR5lMs878716lPhX3mNO/PmdTJH8/OHdpMv52RtqEDB4RihJU 9RAGToPXn9qCRwPQMl3lob4= X-Google-Smtp-Source: AK7set+oirJx1dsvQlN/Z2V/fsl/sXGk9/Y29eUdqktVmfmf2jbXPagkewMzFvuQJ63lAYAj1jwkaQ== X-Received: by 2002:adf:e652:0:b0:2ce:a80c:3747 with SMTP id b18-20020adfe652000000b002cea80c3747mr3208960wrn.71.1679432082307; Tue, 21 Mar 2023 13:54:42 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id a8-20020a056000100800b002d8566128e5sm3744575wrx.25.2023.03.21.13.54.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 13:54:41 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v4 3/4] iov_iter: add copy_page_to_iter_atomic() Date: Tue, 21 Mar 2023 20:54:32 +0000 Message-Id: <31482908634cbb68adafedb65f0b21888c194a1b.1679431886.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Provide an atomic context equivalent for copy_page_to_iter(). This eschews the might_fault() check copies memory in the same way that copy_page_from_iter_atomic() does. This functions assumes a non-compound page, however this mimics the existing behaviour of copy_page_from_iter_atomic(). I am keeping the behaviour consistent between the two, deferring any such change to an explicit folio-fication effort. This is being added in order that an iteratable form of vread() can be implemented with known prefaulted pages to avoid the need for mutex locking. Signed-off-by: Lorenzo Stoakes --- include/linux/uio.h | 2 ++ lib/iov_iter.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index 27e3fd942960..fab07103090f 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -154,6 +154,8 @@ static inline struct iovec iov_iter_iovec(const struct iov_iter *iter) size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, struct iov_iter *i); +size_t copy_page_to_iter_atomic(struct page *page, unsigned offset, + size_t bytes, struct iov_iter *i); void iov_iter_advance(struct iov_iter *i, size_t bytes); void iov_iter_revert(struct iov_iter *i, size_t bytes); size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t bytes); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 274014e4eafe..48ca1c5dfc04 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -821,6 +821,34 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t byt } EXPORT_SYMBOL(copy_page_from_iter_atomic); +size_t copy_page_to_iter_atomic(struct page *page, unsigned offset, size_t bytes, + struct iov_iter *i) +{ + char *kaddr = kmap_local_page(page); + char *p = kaddr + offset; + size_t copied = 0; + + if (!page_copy_sane(page, offset, bytes) || + WARN_ON_ONCE(i->data_source)) + goto out; + + if (unlikely(iov_iter_is_pipe(i))) { + copied = copy_page_to_iter_pipe(page, offset, bytes, i); + goto out; + } + + iterate_and_advance(i, bytes, base, len, off, + copyout(base, p + off, len), + memcpy(base, p + off, len) + ) + copied = bytes; + +out: + kunmap_local(kaddr); + return copied; +} +EXPORT_SYMBOL(copy_page_to_iter_atomic); + static void pipe_advance(struct iov_iter *i, size_t size) { struct pipe_inode_info *pipe = i->pipe; From patchwork Tue Mar 21 20:54:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13183271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D895CC6FD20 for ; Tue, 21 Mar 2023 20:55:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbjCUUy7 (ORCPT ); Tue, 21 Mar 2023 16:54:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230051AbjCUUyr (ORCPT ); Tue, 21 Mar 2023 16:54:47 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9585955520; Tue, 21 Mar 2023 13:54:45 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id m2so15054375wrh.6; Tue, 21 Mar 2023 13:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679432084; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=djyPCeztN5boBqxO3bRuQFLy3bknB1j3IWxmRWH4tok=; b=K2T9ISzZGBsjo+H3DC3xNn/GfV+dlSSaWpMyi0v9+VnMQbOp2uQZBGhOvfXUOdKPv7 OjDNKZpaSqIwTM7/v/Zqqp8+MZu1slgfRyTyNUqzABdsK3pl+jYaTLt39YgsCMvTNrBR 9KhudgegX2vpuDbkGzlfj2/3RuEkcBs3TSToa/yl0ro7kEuLoREzRnciju7WFD2eE0lO Ro+kFQgVS39f27HPCthzBpwQtcegWCYPoiGSU/keSc6JYUY968EBkXpF76ze3Ku7vUY2 vr3b91LMfviYqyyomTIjHOcHrYgY1r+V3jVybqgjtLWfL1QOovsultE/bkUHKCZzkOWM lkTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679432084; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=djyPCeztN5boBqxO3bRuQFLy3bknB1j3IWxmRWH4tok=; b=X6WytVbFGn4p0w8WWjwxWDJsxGB9b32YOFRBE1OG9JtPTLBaEaSJCaFlChuPYzvjyA 0s5WOeL/R0Y2ydSGlbYJsLz3DE+tpi6blE/JoTCaXI80Jc8ABl5XxE25IwCAZ6TR24ym mgZ+TTD8XqFRByUwWRrMUxhtkPmwefpGnxKmHHSPfvRb0mZUL+WpQ+f32k4CDBYOFFET b6V8Sj46VvuIyS/6BWPBPDpZRXwbofwY8O7fZJsIeRfmeV5nyCCnNpQpNnD7ThRCNH1q P68LBR/ARldDQST8HidFujakfetbzS8JPehV0UZuUwB2+u+pQ2krGl9MEkZmmXLbDQE5 VzRw== X-Gm-Message-State: AO0yUKVU5BdP1BNglyDfdmva+TXBHBRgvbkcPJsmqQNwSqE+flU6QX50 567BVq8yQu3DIcp9MM9YcIu3EFEd0vk= X-Google-Smtp-Source: AK7set+i0F+Gcs9rYPy2oaxQ6DJCUcEdzchALZAnBYz6EMwAc4UbmZe+m094k7gbQKUk5tdBwOglnA== X-Received: by 2002:adf:f74a:0:b0:2d1:7ade:aac with SMTP id z10-20020adff74a000000b002d17ade0aacmr3308849wrp.0.1679432083775; Tue, 21 Mar 2023 13:54:43 -0700 (PDT) Received: from lucifer.home (host86-146-209-214.range86-146.btcentralplus.com. [86.146.209.214]) by smtp.googlemail.com with ESMTPSA id a8-20020a056000100800b002d8566128e5sm3744575wrx.25.2023.03.21.13.54.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 13:54:43 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton Cc: Baoquan He , Uladzislau Rezki , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro , Lorenzo Stoakes Subject: [PATCH v4 4/4] mm: vmalloc: convert vread() to vread_iter() Date: Tue, 21 Mar 2023 20:54:33 +0000 Message-Id: <6b3899bbbf1f4bd6b7133c8b6f27b3a8791607b0.1679431886.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Having previously laid the foundation for converting vread() to an iterator function, pull the trigger and do so. This patch attempts to provide minimal refactoring and to reflect the existing logic as best we can, for example we continue to zero portions of memory not read, as before. Overall, there should be no functional difference other than a performance improvement in /proc/kcore access to vmalloc regions. Now we have eliminated the need for a bounce buffer in read_kcore_iter(), we dispense with it. We need to ensure userland pages are faulted in before proceeding, as we take spin locks. Additionally, we must account for the fact that at any point a copy may fail if this happens, we exit indicating fewer bytes retrieved than expected. Signed-off-by: Lorenzo Stoakes --- fs/proc/kcore.c | 26 ++--- include/linux/vmalloc.h | 3 +- mm/nommu.c | 10 +- mm/vmalloc.c | 234 +++++++++++++++++++++++++--------------- 4 files changed, 160 insertions(+), 113 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 25e0eeb8d498..221e16f75ba5 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -307,13 +307,9 @@ static void append_kcore_note(char *notes, size_t *i, const char *name, *i = ALIGN(*i + descsz, 4); } -static ssize_t -read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) +static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) { - struct file *file = iocb->ki_filp; - char *buf = file->private_data; loff_t *ppos = &iocb->ki_pos; - size_t phdrs_offset, notes_offset, data_offset; size_t page_offline_frozen = 1; size_t phdrs_len, notes_len; @@ -507,9 +503,12 @@ read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) switch (m->type) { case KCORE_VMALLOC: - vread(buf, (char *)start, tsz); - /* we have to zero-fill user buffer even if no read */ - if (copy_to_iter(buf, tsz, iter) != tsz) { + /* + * Make sure user pages are faulted in as we acquire + * spinlocks in vread_iter(). + */ + if (fault_in_iov_iter_writeable(iter, tsz) || + vread_iter(iter, (char *)start, tsz) != tsz) { ret = -EFAULT; goto out; } @@ -582,10 +581,6 @@ static int open_kcore(struct inode *inode, struct file *filp) if (ret) return ret; - filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL); - if (!filp->private_data) - return -ENOMEM; - if (kcore_need_update) kcore_update_ram(); if (i_size_read(inode) != proc_root_kcore->size) { @@ -596,16 +591,9 @@ static int open_kcore(struct inode *inode, struct file *filp) return 0; } -static int release_kcore(struct inode *inode, struct file *file) -{ - kfree(file->private_data); - return 0; -} - static const struct proc_ops kcore_proc_ops = { .proc_read_iter = read_kcore_iter, .proc_open = open_kcore, - .proc_release = release_kcore, .proc_lseek = default_llseek, }; diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 69250efa03d1..461aa5637f65 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -9,6 +9,7 @@ #include /* pgprot_t */ #include #include +#include #include @@ -251,7 +252,7 @@ static inline void set_vm_flush_reset_perms(void *addr) #endif /* for /proc/kcore */ -extern long vread(char *buf, char *addr, unsigned long count); +extern long vread_iter(struct iov_iter *iter, const char *addr, size_t count); /* * Internals. Don't use.. diff --git a/mm/nommu.c b/mm/nommu.c index 57ba243c6a37..e0fcd948096e 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -36,6 +36,7 @@ #include #include +#include #include #include #include @@ -198,14 +199,13 @@ unsigned long vmalloc_to_pfn(const void *addr) } EXPORT_SYMBOL(vmalloc_to_pfn); -long vread(char *buf, char *addr, unsigned long count) +long vread_iter(struct iov_iter *iter, char *addr, size_t count) { /* Don't allow overflow */ - if ((unsigned long) buf + count < count) - count = -(unsigned long) buf; + if ((unsigned long) addr + count < count) + count = -(unsigned long) addr; - memcpy(buf, addr, count); - return count; + return copy_to_iter(addr, count, iter); } /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 978194dc2bb8..ebfa1e9fe6f9 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include @@ -3442,62 +3441,95 @@ void *vmalloc_32_user(unsigned long size) EXPORT_SYMBOL(vmalloc_32_user); /* - * small helper routine , copy contents to buf from addr. - * If the page is not present, fill zero. + * Atomically zero bytes in the iterator. + * + * Returns the number of zeroed bytes. */ +size_t zero_iter(struct iov_iter *iter, size_t count) +{ + size_t remains = count; + + while (remains > 0) { + size_t num, copied; + + num = remains < PAGE_SIZE ? remains : PAGE_SIZE; + copied = copy_page_to_iter_atomic(ZERO_PAGE(0), 0, num, iter); + remains -= copied; + + if (copied < num) + break; + } + + return count - remains; +} -static int aligned_vread(char *buf, char *addr, unsigned long count) +/* + * small helper routine, copy contents to iter from addr. + * If the page is not present, fill zero. + * + * Returns the number of copied bytes. + */ +static size_t aligned_vread_iter(struct iov_iter *iter, + const char *addr, size_t count) { - struct page *p; - int copied = 0; + size_t remains = count; + struct page *page; - while (count) { + while (remains > 0) { unsigned long offset, length; + size_t copied = 0; offset = offset_in_page(addr); length = PAGE_SIZE - offset; - if (length > count) - length = count; - p = vmalloc_to_page(addr); + if (length > remains) + length = remains; + page = vmalloc_to_page(addr); /* - * To do safe access to this _mapped_ area, we need - * lock. But adding lock here means that we need to add - * overhead of vmalloc()/vfree() calls for this _debug_ - * interface, rarely used. Instead of that, we'll use - * kmap() and get small overhead in this access function. + * To do safe access to this _mapped_ area, we need lock. But + * adding lock here means that we need to add overhead of + * vmalloc()/vfree() calls for this _debug_ interface, rarely + * used. Instead of that, we'll use an local mapping via + * copy_page_to_iter_atomic() and accept a small overhead in + * this access function. */ - if (p) { - /* We can expect USER0 is not used -- see vread() */ - void *map = kmap_atomic(p); - memcpy(buf, map + offset, length); - kunmap_atomic(map); - } else - memset(buf, 0, length); + if (page) + copied = copy_page_to_iter_atomic(page, offset, length, + iter); + + /* Zero anything we were unable to copy. */ + copied += zero_iter(iter, length - copied); + + addr += copied; + remains -= copied; - addr += length; - buf += length; - copied += length; - count -= length; + if (copied != length) + break; } - return copied; + + return count - remains; } -static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags) +/* + * Read from a vm_map_ram region of memory. + * + * Returns the number of copied bytes. + */ +static size_t vmap_ram_vread_iter(struct iov_iter *iter, const char *addr, + size_t count, unsigned long flags) { char *start; struct vmap_block *vb; unsigned long offset; - unsigned int rs, re, n; + unsigned int rs, re; + size_t remains, n; /* * If it's area created by vm_map_ram() interface directly, but * not further subdividing and delegating management to vmap_block, * handle it here. */ - if (!(flags & VMAP_BLOCK)) { - aligned_vread(buf, addr, count); - return; - } + if (!(flags & VMAP_BLOCK)) + return aligned_vread_iter(iter, addr, count); /* * Area is split into regions and tracked with vmap_block, read out @@ -3505,50 +3537,65 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags */ vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr)); if (!vb) - goto finished; + goto finished_zero; spin_lock(&vb->lock); if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) { spin_unlock(&vb->lock); - goto finished; + goto finished_zero; } + + remains = count; for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) { - if (!count) - break; + size_t copied; + + if (remains == 0) + goto finished; + start = vmap_block_vaddr(vb->va->va_start, rs); - while (addr < start) { - if (count == 0) - goto unlock; - *buf = '\0'; - buf++; - addr++; - count--; + + if (addr < start) { + size_t to_zero = min_t(size_t, start - addr, remains); + size_t zeroed = zero_iter(iter, to_zero); + + addr += zeroed; + remains -= zeroed; + + if (remains == 0 || zeroed != to_zero) + goto finished; } + /*it could start reading from the middle of used region*/ offset = offset_in_page(addr); n = ((re - rs + 1) << PAGE_SHIFT) - offset; - if (n > count) - n = count; - aligned_vread(buf, start+offset, n); + if (n > remains) + n = remains; + + copied = aligned_vread_iter(iter, start + offset, n); - buf += n; - addr += n; - count -= n; + addr += copied; + remains -= copied; + + if (copied != n) + goto finished; } -unlock: + spin_unlock(&vb->lock); -finished: +finished_zero: /* zero-fill the left dirty or free regions */ - if (count) - memset(buf, 0, count); + return count - remains + zero_iter(iter, remains); +finished: + /* We couldn't copy/zero everything */ + spin_unlock(&vb->lock); + return count - remains; } /** - * vread() - read vmalloc area in a safe way. - * @buf: buffer for reading data - * @addr: vm address. - * @count: number of bytes to be read. + * vread_iter() - read vmalloc area in a safe way to an iterator. + * @iter: the iterator to which data should be written. + * @addr: vm address. + * @count: number of bytes to be read. * * This function checks that addr is a valid vmalloc'ed area, and * copy data from that area to a given buffer. If the given memory range @@ -3568,13 +3615,12 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags * (same number as @count) or %0 if [addr...addr+count) doesn't * include any intersection with valid vmalloc area */ -long vread(char *buf, char *addr, unsigned long count) +long vread_iter(struct iov_iter *iter, const char *addr, size_t count) { struct vmap_area *va; struct vm_struct *vm; - char *vaddr, *buf_start = buf; - unsigned long buflen = count; - unsigned long n, size, flags; + char *vaddr; + size_t n, size, flags, remains; addr = kasan_reset_tag(addr); @@ -3582,18 +3628,22 @@ long vread(char *buf, char *addr, unsigned long count) if ((unsigned long) addr + count < count) count = -(unsigned long) addr; + remains = count; + spin_lock(&vmap_area_lock); va = find_vmap_area_exceed_addr((unsigned long)addr); if (!va) - goto finished; + goto finished_zero; /* no intersects with alive vmap_area */ - if ((unsigned long)addr + count <= va->va_start) - goto finished; + if ((unsigned long)addr + remains <= va->va_start) + goto finished_zero; list_for_each_entry_from(va, &vmap_area_list, list) { - if (!count) - break; + size_t copied; + + if (remains == 0) + goto finished; vm = va->vm; flags = va->flags & VMAP_FLAGS_MASK; @@ -3608,6 +3658,7 @@ long vread(char *buf, char *addr, unsigned long count) if (vm && (vm->flags & VM_UNINITIALIZED)) continue; + /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */ smp_rmb(); @@ -3616,38 +3667,45 @@ long vread(char *buf, char *addr, unsigned long count) if (addr >= vaddr + size) continue; - while (addr < vaddr) { - if (count == 0) + + if (addr < vaddr) { + size_t to_zero = min_t(size_t, vaddr - addr, remains); + size_t zeroed = zero_iter(iter, to_zero); + + addr += zeroed; + remains -= zeroed; + + if (remains == 0 || zeroed != to_zero) goto finished; - *buf = '\0'; - buf++; - addr++; - count--; } + n = vaddr + size - addr; - if (n > count) - n = count; + if (n > remains) + n = remains; if (flags & VMAP_RAM) - vmap_ram_vread(buf, addr, n, flags); + copied = vmap_ram_vread_iter(iter, addr, n, flags); else if (!(vm->flags & VM_IOREMAP)) - aligned_vread(buf, addr, n); + copied = aligned_vread_iter(iter, addr, n); else /* IOREMAP area is treated as memory hole */ - memset(buf, 0, n); - buf += n; - addr += n; - count -= n; + copied = zero_iter(iter, n); + + addr += copied; + remains -= copied; + + if (copied != n) + goto finished; } -finished: - spin_unlock(&vmap_area_lock); - if (buf == buf_start) - return 0; +finished_zero: + spin_unlock(&vmap_area_lock); /* zero-fill memory holes */ - if (buf != buf_start + buflen) - memset(buf, 0, buflen - (buf - buf_start)); + return count - remains + zero_iter(iter, remains); +finished: + /* Nothing remains, or We couldn't copy/zero everything. */ + spin_unlock(&vmap_area_lock); - return buflen; + return count - remains; } /**