From patchwork Fri Feb 9 20:56:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 10209969 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0AF8D60245 for ; Fri, 9 Feb 2018 20:57:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF73B29904 for ; Fri, 9 Feb 2018 20:57:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DCB0229911; Fri, 9 Feb 2018 20:57:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1504A2990D for ; Fri, 9 Feb 2018 20:57:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752686AbeBIU47 (ORCPT ); Fri, 9 Feb 2018 15:56:59 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:36044 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750853AbeBIU46 (ORCPT ); Fri, 9 Feb 2018 15:56:58 -0500 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.92]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 8B661E97; Fri, 9 Feb 2018 20:56:57 +0000 (UTC) Date: Fri, 9 Feb 2018 12:56:56 -0800 From: Andrew Morton To: Kirill Tkhai Cc: jack@suse.cz, amir73il@gmail.com, willy@infradead.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, gorcunov@virtuozzo.com Subject: Re: [PATCH v2] inotify: Extend ioctl to allow to request id of new watch descriptor Message-Id: <20180209125656.e440e0518540d6b76ae42bc0@linux-foundation.org> In-Reply-To: References: <151810242614.30935.12876744458891870220.stgit@localhost.localdomain> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, 9 Feb 2018 18:04:54 +0300 Kirill Tkhai wrote: > Watch descriptor is id of the watch created by inotify_add_watch(). > It is allocated in inotify_add_to_idr(), and takes the numbers > starting from 1. Every new inotify watch obtains next available > number (usually, old + 1), as served by idr_alloc_cyclic(). > > CRIU (Checkpoint/Restore In Userspace) project supports inotify > files, and restores watched descriptors with the same numbers, > they had before dump. Since there was no kernel support, we > had to use cycle to add a watch with specific descriptor id: > > while (1) { > int wd; > > wd = inotify_add_watch(inotify_fd, path, mask); > if (wd < 0) { > break; > } else if (wd == desired_wd_id) { > ret = 0; > break; > } > > inotify_rm_watch(inotify_fd, wd); > } > > (You may find the actual code at the below link: > https://github.com/checkpoint-restore/criu/blob/v3.7/criu/fsnotify.c#L577) > > The cycle is suboptiomal and very expensive, but since there is no better > kernel support, it was the only way to restore that. Happily, we had met > mostly descriptors with small id, and this approach had worked somehow. > > But recent time containers with inotify with big watch descriptors > begun to come, and this way stopped to work at all. When descriptor id > is something about 0x34d71d6, the restoring process spins in busy loop > for a long time, and the restore hungs and delay of migration from node > to node could easily be watched. > > This patch aims to solve this problem. It introduces new ioctl > INOTIFY_IOC_SETNEXTWD, which allows to request the number of next created > watch descriptor from userspace. It simply calls idr_set_cursor() primitive > to populate idr::idr_next, so that next idr_alloc_cyclic() allocation > will return this id, if it is not occupied. This is the way which is > used to restore some other resources from userspace. For example, > /proc/sys/kernel/ns_last_pid works the same for task pids. > > The new code is under CONFIG_CHECKPOINT_RESTORE #define, so small system > may exclude it. > Reviewed-by: Andrew Morton With a little cleanup: --- a/fs/notify/inotify/inotify_user.c~inotify-extend-ioctl-to-allow-to-request-id-of-new-watch-descriptor-fix +++ a/fs/notify/inotify/inotify_user.c @@ -285,7 +285,6 @@ static int inotify_release(struct inode static long inotify_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { - struct inotify_group_private_data *data __maybe_unused; struct fsnotify_group *group; struct fsnotify_event *fsn_event; void __user *p; @@ -294,7 +293,6 @@ static long inotify_ioctl(struct file *f group = file->private_data; p = (void __user *) arg; - data = &group->inotify_data; pr_debug("%s: group=%p cmd=%u\n", __func__, group, cmd); @@ -313,6 +311,9 @@ static long inotify_ioctl(struct file *f case INOTIFY_IOC_SETNEXTWD: ret = -EINVAL; if (arg >= 1 && arg <= INT_MAX) { + struct inotify_group_private_data *data; + + data = &group->inotify_data; spin_lock(&data->idr_lock); idr_set_cursor(&data->idr, (unsigned int)arg); spin_unlock(&data->idr_lock);