From patchwork Mon Apr 24 15:39:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cyrill Gorcunov X-Patchwork-Id: 9696623 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 46039603F3 for ; Mon, 24 Apr 2017 15:45:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 364AF27813 for ; Mon, 24 Apr 2017 15:45:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 295A9283FE; Mon, 24 Apr 2017 15:45:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7F4F27813 for ; Mon, 24 Apr 2017 15:45:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S972100AbdDXPpC (ORCPT ); Mon, 24 Apr 2017 11:45:02 -0400 Received: from mail-lf0-f67.google.com ([209.85.215.67]:35799 "EHLO mail-lf0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S972079AbdDXPot (ORCPT ); Mon, 24 Apr 2017 11:44:49 -0400 Received: by mail-lf0-f67.google.com with SMTP id i3so17130545lfh.2; Mon, 24 Apr 2017 08:44:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:user-agent:date:from:to:cc:subject:mime-version :content-disposition; bh=nOy59l2FtMcBSZkFbD/qF/6xH0G5M/qL9M4BM8VImWc=; b=CqWJCOgT6Y4mfIcq1Fn/FKkMRHroXO8+tp8O2LfjFtHobKMku2Ryn5aYLxuA0hjbMJ XCxxXTYhrvyNl+9Tk+f36bFYznRXyBTb2DU8L55lxg68JmABoOz/MSc3LGHwZtxTSGgZ 9bK77Nb8cvHbA/EpHzEFkT1303I988fPLAF/pM3Ri1puEZgu9yYRIhPey8A/S4BcNnun oXKpkZex52yWLBxJ1fe3uT+tL6ee40mwbqHyiwI9Oh/ocab25nQBpSn1lAScmocpcsib dZPlYEd9kR/HuVt0oj6dSqVJd94Wb+PybDaUE0xa3HptxG0/Zveixx4tdh3Kzk/Gfj4y UepQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:user-agent:date:from:to:cc:subject :mime-version:content-disposition; bh=nOy59l2FtMcBSZkFbD/qF/6xH0G5M/qL9M4BM8VImWc=; b=mfROg2CaaUz0x15bQRCLoBTq+Yvv/Ulwj9pEb/A86geQxembEUZywEfhUfx01/GdvM mnZtgk/uQ6Z5oQh2h+hZGrAPocoZ/uVHYOUl0D8PUEa8HQ7rMXJKSOmAxAtwiU3Ftr10 SFv1fxx1gebaZV1eyeKQ8ZRbwWX+drsEIcxXhZk/yC0+Pbe6X950pIUqDRylsU/aFJ8f mo6JCylvn31zWpN69XG1T6U9Rqn+XhFodkDhuxSKykuT568LhgGMGVdi20yVPiZ0JnS6 Kuumwvg0uZhKQj5FOXqPoD+rJoFqfCxAlJZiJndA4fCEe1YfMNTTLeTQUnZlBUNRDutJ zIeg== X-Gm-Message-State: AN3rC/40hqoGo10Q7GJY6/65AaaHGmjC3EcTtdH2z03KnYazU9YrQbUG yDZnkMjvWWgAVQ== X-Received: by 10.25.216.22 with SMTP id p22mr8764112lfg.45.1493048687472; Mon, 24 Apr 2017 08:44:47 -0700 (PDT) Received: from uranus.localdomain ([5.18.237.75]) by smtp.gmail.com with ESMTPSA id 70sm3358858ljj.16.2017.04.24.08.44.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Apr 2017 08:44:46 -0700 (PDT) Received: by uranus.localdomain (Postfix, from userid 1000) id 8C27D249B1; Mon, 24 Apr 2017 18:44:23 +0300 (MSK) Message-Id: <20170424154423.511592110@gmail.com> User-Agent: quilt/0.64 Date: Mon, 24 Apr 2017 18:39:28 +0300 From: Cyrill Gorcunov To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: viro@zeniv.linux.org.uk, akpm@linuxfoundation.org, avagin@virtuozzo.com, xemul@virtuozzo.com, mtk.manpages@gmail.com, gorcunov@openvz.org, avagin@openvz.org, jbaron@akamai.com, luto@amacapital.net Subject: [patch v4 resend 2/2] kcmp: Add KCMP_EPOLL_TFD mode to compare epoll target files MIME-Version: 1.0 Content-Disposition: inline; filename=kcmp-epoll-4 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With current epoll architecture target files are addressed with file_struct and file descriptor number, where the last is not unique. Moreover files can be transferred from another process via unix socket, added into queue and closed then so we won't find this descriptor in the task fdinfo list. Thus to checkpoint and restore such processes CRIU needs to find out where exactly the target file is present to add it into epoll queue. For this sake one can use kcmp call where some particular target file from the queue is compared with arbitrary file passed as an argument. Because epoll target files can have same file descriptor number but different file_struct a caller should explicitly specify the offset within. To test if some particular file is matching entry inside epoll one have to - fill kcmp_epoll_slot structure with epoll file descriptor, target file number and target file offset (in case if only one target is present then it should be 0) - call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot) - the kernel fetch file pointer matching file descriptor @fd of pid1 - lookups for file struct in epoll queue of pid2 and returns traditional 0,1,2 result for sorting purpose v2: - Use KCMP_FILES salt for files comparision (for convenience sake, since the pointers are file structs so user can lookup over previously collected files tree) - Make kcmp_epoll_target as a separate helper instead of opencoding it with #ifdef v3: - Use less if()s in kcmp_epoll_target for readability sake (by avagin@) - Use u32 for kcmp_epoll_slot::toff instead of u64, which makes the less memory pressue Signed-off-by: Cyrill Gorcunov Acked-by: Andrey Vagin CC: Al Viro CC: Andrew Morton CC: Pavel Emelyanov CC: Michael Kerrisk CC: Jason Baron CC: Andy Lutomirski --- fs/eventpoll.c | 42 +++++++++++++++++++++++++++++++++ include/linux/eventpoll.h | 3 ++ include/uapi/linux/kcmp.h | 10 ++++++++ kernel/kcmp.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 112 insertions(+) Index: linux-ml.git/fs/eventpoll.c =================================================================== --- linux-ml.git.orig/fs/eventpoll.c +++ linux-ml.git/fs/eventpoll.c @@ -1000,6 +1000,48 @@ static struct epitem *ep_find(struct eve return epir; } +static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) +{ + struct rb_node *rbp; + struct epitem *epi; + + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { + epi = rb_entry(rbp, struct epitem, rbn); + if (epi->ffd.fd == tfd) { + if (toff == 0) + return epi; + else + toff--; + } + cond_resched(); + } + + return NULL; +} + +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, + unsigned long toff) +{ + struct file *file_raw; + struct eventpoll *ep; + struct epitem *epi; + + if (!is_file_epoll(file)) + return ERR_PTR(-EINVAL); + + ep = file->private_data; + + mutex_lock(&ep->mtx); + epi = ep_find_tfd(ep, tfd, toff); + if (epi) + file_raw = epi->ffd.file; + else + file_raw = ERR_PTR(-ENOENT); + mutex_unlock(&ep->mtx); + + return file_raw; +} + /* * This is the callback that is passed to the wait queue wakeup * mechanism. It is called by the stored file descriptors when they Index: linux-ml.git/include/linux/eventpoll.h =================================================================== --- linux-ml.git.orig/include/linux/eventpoll.h +++ linux-ml.git/include/linux/eventpoll.h @@ -14,6 +14,7 @@ #define _LINUX_EVENTPOLL_H #include +#include /* Forward declarations to avoid compiler errors */ @@ -22,6 +23,8 @@ struct file; #ifdef CONFIG_EPOLL +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); + /* Used to initialize the epoll bits inside the "struct file" */ static inline void eventpoll_init_file(struct file *file) { Index: linux-ml.git/include/uapi/linux/kcmp.h =================================================================== --- linux-ml.git.orig/include/uapi/linux/kcmp.h +++ linux-ml.git/include/uapi/linux/kcmp.h @@ -1,6 +1,8 @@ #ifndef _UAPI_LINUX_KCMP_H #define _UAPI_LINUX_KCMP_H +#include + /* Comparison type */ enum kcmp_type { KCMP_FILE, @@ -10,8 +12,16 @@ enum kcmp_type { KCMP_SIGHAND, KCMP_IO, KCMP_SYSVSEM, + KCMP_EPOLL_TFD, KCMP_TYPES, }; +/* Slot for KCMP_EPOLL_TFD */ +struct kcmp_epoll_slot { + __u32 efd; /* epoll file descriptor */ + __u32 tfd; /* target file number */ + __u32 toff; /* target offset within same numbered sequence */ +}; + #endif /* _UAPI_LINUX_KCMP_H */ Index: linux-ml.git/kernel/kcmp.c =================================================================== --- linux-ml.git.orig/kernel/kcmp.c +++ linux-ml.git/kernel/kcmp.c @@ -11,6 +11,10 @@ #include #include #include +#include +#include +#include +#include #include @@ -94,6 +98,56 @@ static int kcmp_lock(struct mutex *m1, s return err; } +#ifdef CONFIG_EPOLL +static int kcmp_epoll_target(struct task_struct *task1, + struct task_struct *task2, + unsigned long idx1, + struct kcmp_epoll_slot __user *uslot) +{ + struct file *filp, *filp_epoll, *filp_tgt; + struct kcmp_epoll_slot slot; + struct files_struct *files; + + if (copy_from_user(&slot, uslot, sizeof(slot))) + return -EFAULT; + + filp = get_file_raw_ptr(task1, idx1); + if (!filp) + return -EBADF; + + files = get_files_struct(task2); + if (!files) + return -EBADF; + + spin_lock(&files->file_lock); + filp_epoll = fcheck_files(files, slot.efd); + if (filp_epoll) + get_file(filp_epoll); + else + filp_tgt = ERR_PTR(-EBADF); + spin_unlock(&files->file_lock); + put_files_struct(files); + + if (filp_epoll) { + filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); + fput(filp_epoll); + } else + + if (IS_ERR(filp_tgt)) + return PTR_ERR(filp_tgt); + + return kcmp_ptr(filp, filp_tgt, KCMP_FILE); +} +#else +static int kcmp_epoll_target(struct task_struct *task1, + struct task_struct *task2, + unsigned long idx1, + struct kcmp_epoll_slot __user *uslot) +{ + return -EOPNOTSUPP; +} +#endif + SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, unsigned long, idx1, unsigned long, idx2) { @@ -165,6 +219,9 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t ret = -EOPNOTSUPP; #endif break; + case KCMP_EPOLL_TFD: + ret = kcmp_epoll_target(task1, task2, idx1, (void *)idx2); + break; default: ret = -EINVAL; break;