From patchwork Sun Sep 26 17:06:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12518953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AD2CC433EF for ; Mon, 27 Sep 2021 00:37:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8701960F39 for ; Mon, 27 Sep 2021 00:37:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8701960F39 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D89636B0071; Sun, 26 Sep 2021 20:37:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0E916B0072; Sun, 26 Sep 2021 20:37:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAF7D900002; Sun, 26 Sep 2021 20:37:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id A9D5C6B0071 for ; Sun, 26 Sep 2021 20:37:41 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6C4EC3015A for ; Mon, 27 Sep 2021 00:37:41 +0000 (UTC) X-FDA: 78631490322.18.DE587D6 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf27.hostedemail.com (Postfix) with ESMTP id 2AA6B700009D for ; Mon, 27 Sep 2021 00:37:41 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id lb1-20020a17090b4a4100b001993f863df2so11994582pjb.5 for ; Sun, 26 Sep 2021 17:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ZA8HS1ivZg91Ke/y1Kgrf08VWpYbEytVu6rjydTPJL8=; b=KkRXvfr5yUVD7EXZ1b8SudB7kjuBVYemqMEsFXA6Xu4K2gvSmOnJYzi64fbEe/4BKD amkcSM/XUjEkDPjr9RdpAB4MVUEIITrA+d8Y9rgpSADvqSmt8wWxG1gfBSUiURrvtcWL cJ/3lRzcZ/S9cnrZMQiDXguDeZyI0GIHwh2b7p4A+2cE6cGPFuVD7hw7+JeCfliKrs+O ssLdCXKBcvk7zbLaW8jhv39Ez5rRHiWOb31TjTjEhcQSJR+GZwWgy5CfyEc5ZBPHnsMc 6v//1Ld+59hh2xQngljolo5ovJFLGPc/FqnkdDI4/BwZFMnYk+LMgbwIYuPHKQRQSrNl rQXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ZA8HS1ivZg91Ke/y1Kgrf08VWpYbEytVu6rjydTPJL8=; b=M5gPta/Lpo9KL4LyXRw3Mui7hFCclhET1XQUOHDcu9f8C5lUeBSh+EjtZ/CN45tMlt V69+d4R3F4lgOBfQTe9dTzvOfcNATV6FLkj88Y/dIoXTG152U/gSr4NMqYMlG9cofyiq Cb01Rawj5PDZ/qCyp909r6l8+rR9lw+d1uOhuFtsk7usy9pdJ0V8rokdRHkZaRrLdUwd BxPYUDWu7jJ1i9RQQIHpX2ndFhBSDuwLsuRHAjTBYbaULT/CWLuA6CQZULqyUfN8aaww fGqcPTm4GVAR5BK88pebDpZQzi+T9wqtMP8CJ/eQIGWG5iBnHfXGpqFlt8QGn52b/EYM ltWw== X-Gm-Message-State: AOAM531HBlxJ/RdfnS75Qk2cxTD+NqSfjbGrbRCvHOiRuCYjZrZb2vE8 m3I2R1rVxRBgSttzqJWcKNw= X-Google-Smtp-Source: ABdhPJzvkzM/By793UAtTLXntevGRAwBN2++gpIagGa2o4rNumDxFnwr2T6EBQxFv3+CbRgOkpT9PQ== X-Received: by 2002:a17:90a:c982:: with SMTP id w2mr16299956pjt.30.1632703059970; Sun, 26 Sep 2021 17:37:39 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id i15sm16445277pgo.4.2021.09.26.17.37.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Sep 2021 17:37:39 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Nadav Amit , Andrea Arcangeli , Mike Rapoport , Peter Xu Subject: [RFC PATCH] userfaultfd: support control over mm of remote PIDs Date: Sun, 26 Sep 2021 10:06:37 -0700 Message-Id: <20210926170637.245699-1-namit@vmware.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=KkRXvfr5; spf=none (imf27.hostedemail.com: domain of mail-pj1-f41.google.com has no SPF policy when checking 209.85.216.41) smtp.helo=mail-pj1-f41.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2AA6B700009D X-Stat-Signature: zsto6ynsqxx8ffxnbb7mh7i8kj6t4eei X-HE-Tag: 1632703061-488818 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Non-cooperative mode is useful but only for forked processes. Userfaultfd can be useful to monitor, debug and manage memory of remote processes. To support this mode, add a new flag, UFFD_REMOTE_PID, and an optional second argument to the userfaultfd syscall. When the flag is set, the second argument is assumed to be the PID of the process that is to be monitored. Otherwise the flag is ignored. The syscall enforces that the caller has CAP_SYS_PTRACE to prevent misuse of this feature. Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Mike Rapoport Cc: Peter Xu Signed-off-by: Nadav Amit --- I know that I have an RFC regarding the use of iouring with userfaultfd. I do intend to follow this RFC as well, but it requires some more work. --- fs/userfaultfd.c | 71 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 59 insertions(+), 12 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 003f0d31743e..cf44e1e13a03 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -2053,10 +2053,39 @@ static void init_once_userfaultfd_ctx(void *mem) seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock); } -SYSCALL_DEFINE1(userfaultfd, int, flags) +static int userfaultfd_get_remote_mm(struct userfaultfd_ctx *ctx, int pidfd) { - struct userfaultfd_ctx *ctx; - int fd; + struct task_struct *task; + struct pid *pid; + struct fd f; + int ret; + + f = fdget(pidfd); + if (!f.file) + return -EBADF; + + pid = pidfd_pid(f.file); + + task = get_pid_task(pid, PIDTYPE_PID); + ret = -ESRCH; + if (!task) + goto err_out; + + ctx->mm = task->mm; + mmgrab(ctx->mm); + put_task_struct(task); + ret = 0; +out: + return ret; +err_out: + fdput(f); + goto out; +} + +SYSCALL_DEFINE2(userfaultfd, int, flags, int, pidfd) +{ + struct userfaultfd_ctx *ctx = NULL; + int ret; if (!sysctl_unprivileged_userfaultfd && (flags & UFFD_USER_MODE_ONLY) == 0 && @@ -2067,14 +2096,19 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) return -EPERM; } + if ((flags & UFFD_REMOTE_PID) && !capable(CAP_SYS_PTRACE)) + return -EPERM; + BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ + BUILD_BUG_ON(UFFD_REMOTE_PID & UFFD_SHARED_FCNTL_FLAGS); BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS); BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK); - if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY)) + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY | + UFFD_REMOTE_PID)) return -EINVAL; ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL); @@ -2086,17 +2120,30 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) ctx->features = 0; ctx->released = false; atomic_set(&ctx->mmap_changing, 0); - ctx->mm = current->mm; - /* prevent the mm struct to be freed */ - mmgrab(ctx->mm); + ctx->mm = NULL; + + if (flags & UFFD_REMOTE_PID) { + /* the remote mm is grabbed by the following call */ + ret = userfaultfd_get_remote_mm(ctx, pidfd); + if (ret) + goto err_out; + } else { + ctx->mm = current->mm; + /* prevent the mm struct to be freed */ + mmgrab(ctx->mm); + } - fd = anon_inode_getfd_secure("[userfaultfd]", &userfaultfd_fops, ctx, + ret = anon_inode_getfd_secure("[userfaultfd]", &userfaultfd_fops, ctx, O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL); - if (fd < 0) { + if (ret < 0) + goto err_out; +out: + return ret; +err_out: + if (ctx->mm) mmdrop(ctx->mm); - kmem_cache_free(userfaultfd_ctx_cachep, ctx); - } - return fd; + kmem_cache_free(userfaultfd_ctx_cachep, ctx); + goto out; } static int __init userfaultfd_init(void)