From patchwork Mon Dec 10 17:12:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721991 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E5C8214E2 for ; Mon, 10 Dec 2018 17:21:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CAF002AF85 for ; Mon, 10 Dec 2018 17:21:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BF94D2AF9E; Mon, 10 Dec 2018 17:21:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7414E2AF85 for ; Mon, 10 Dec 2018 17:21:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728331AbeLJRNe (ORCPT ); Mon, 10 Dec 2018 12:13:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45074 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728322AbeLJRNd (ORCPT ); Mon, 10 Dec 2018 12:13:33 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6A1762D7F8; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3D53A105B1E1; Mon, 10 Dec 2018 17:13:30 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 07D912239C4; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 01/52] fuse: add skeleton virtio_fs.ko module Date: Mon, 10 Dec 2018 12:12:27 -0500 Message-Id: <20181210171318.16998-2-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Add a basic file system module for virtio-fs. Signed-off-by: Stefan Hajnoczi --- fs/fuse/Kconfig | 10 ++++++++++ fs/fuse/Makefile | 1 + fs/fuse/virtio_fs.c | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+) create mode 100644 fs/fuse/virtio_fs.c diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig index 76f09ce7e5b2..0b1375126420 100644 --- a/fs/fuse/Kconfig +++ b/fs/fuse/Kconfig @@ -26,3 +26,13 @@ config CUSE If you want to develop or use a userspace character device based on CUSE, answer Y or M. + +config VIRTIO_FS + tristate "Virtio Filesystem" + depends on FUSE_FS + help + The Virtio Filesystem allows guests to mount file systems from the + host. + + If you want to share files between guests or with the host, answer Y + or M. diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index f7b807bc1027..47b78fac5809 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -4,5 +4,6 @@ obj-$(CONFIG_FUSE_FS) += fuse.o obj-$(CONFIG_CUSE) += cuse.o +obj-$(CONFIG_VIRTIO_FS) += virtio_fs.o fuse-objs := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c new file mode 100644 index 000000000000..6b7d3973bd85 --- /dev/null +++ b/fs/fuse/virtio_fs.c @@ -0,0 +1,33 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * virtio-fs: Virtio Filesystem + * Copyright (C) 2018 Red Hat, Inc. + */ + +#include +#include + +MODULE_AUTHOR("Stefan Hajnoczi "); +MODULE_DESCRIPTION("Virtio Filesystem"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_FS(KBUILD_MODNAME); + +static struct file_system_type virtio_fs_type = { + .owner = THIS_MODULE, + .name = KBUILD_MODNAME, + .mount = NULL, + .kill_sb = NULL, +}; + +static int __init virtio_fs_init(void) +{ + return register_filesystem(&virtio_fs_type); +} + +static void __exit virtio_fs_exit(void) +{ + unregister_filesystem(&virtio_fs_type); +} + +module_init(virtio_fs_init); +module_exit(virtio_fs_exit); From patchwork Mon Dec 10 17:12:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721853 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AA5C14E2 for ; Mon, 10 Dec 2018 17:16:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E9972AF3D for ; Mon, 10 Dec 2018 17:16:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 431482AF3F; Mon, 10 Dec 2018 17:16:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21C332AF37 for ; Mon, 10 Dec 2018 17:16:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728437AbeLJRNh (ORCPT ); Mon, 10 Dec 2018 12:13:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55332 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728388AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 78234804E0; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 73E045D717; Mon, 10 Dec 2018 17:13:30 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 0AE38223A08; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 02/52] fuse: add probe/remove virtio driver Date: Mon, 10 Dec 2018 12:12:28 -0500 Message-Id: <20181210171318.16998-3-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Add basic probe/remove functionality for the new virtio-fs device. Signed-off-by: Stefan Hajnoczi --- fs/fuse/Kconfig | 1 + fs/fuse/virtio_fs.c | 160 ++++++++++++++++++++++++++++++++++++++-- include/uapi/linux/virtio_fs.h | 41 ++++++++++ include/uapi/linux/virtio_ids.h | 1 + 4 files changed, 195 insertions(+), 8 deletions(-) create mode 100644 include/uapi/linux/virtio_fs.h diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig index 0b1375126420..46e9a8ff9f7a 100644 --- a/fs/fuse/Kconfig +++ b/fs/fuse/Kconfig @@ -30,6 +30,7 @@ config CUSE config VIRTIO_FS tristate "Virtio Filesystem" depends on FUSE_FS + select VIRTIO help The Virtio Filesystem allows guests to mount file systems from the host. diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 6b7d3973bd85..aac9c3c42827 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -4,13 +4,139 @@ * Copyright (C) 2018 Red Hat, Inc. */ -#include #include +#include +#include +#include -MODULE_AUTHOR("Stefan Hajnoczi "); -MODULE_DESCRIPTION("Virtio Filesystem"); -MODULE_LICENSE("GPL"); -MODULE_ALIAS_FS(KBUILD_MODNAME); +/* List of virtio-fs device instances and a lock for the list */ +static DEFINE_MUTEX(virtio_fs_mutex); +static LIST_HEAD(virtio_fs_instances); + +/* A virtio-fs device instance */ +struct virtio_fs { + struct list_head list; /* on virtio_fs_instances */ + char *tag; +}; + +/* Add a new instance to the list or return -EEXIST if tag name exists*/ +static int virtio_fs_add_instance(struct virtio_fs *fs) +{ + struct virtio_fs *fs2; + bool duplicate = false; + + mutex_lock(&virtio_fs_mutex); + + list_for_each_entry(fs2, &virtio_fs_instances, list) { + if (strcmp(fs->tag, fs2->tag) == 0) + duplicate = true; + } + + if (!duplicate) + list_add_tail(&fs->list, &virtio_fs_instances); + + mutex_unlock(&virtio_fs_mutex); + + if (duplicate) + return -EEXIST; + return 0; +} + +/* Read filesystem name from virtio config into fs->tag (must kfree()). */ +static int virtio_fs_read_tag(struct virtio_device *vdev, struct virtio_fs *fs) +{ + char tag_buf[sizeof_field(struct virtio_fs_config, tag)]; + char *end; + size_t len; + + virtio_cread_bytes(vdev, offsetof(struct virtio_fs_config, tag), + &tag_buf, sizeof(tag_buf)); + end = memchr(tag_buf, '\0', sizeof(tag_buf)); + if (end == tag_buf) + return -EINVAL; /* empty tag */ + if (!end) + end = &tag_buf[sizeof(tag_buf)]; + + len = end - tag_buf; + fs->tag = devm_kmalloc(&vdev->dev, len + 1, GFP_KERNEL); + if (!fs->tag) + return -ENOMEM; + memcpy(fs->tag, tag_buf, len); + fs->tag[len] = '\0'; + return 0; +} + +static int virtio_fs_probe(struct virtio_device *vdev) +{ + struct virtio_fs *fs; + int ret; + + fs = devm_kzalloc(&vdev->dev, sizeof(*fs), GFP_KERNEL); + if (!fs) + return -ENOMEM; + vdev->priv = fs; + + ret = virtio_fs_read_tag(vdev, fs); + if (ret < 0) + goto out; + + ret = virtio_fs_add_instance(fs); + if (ret < 0) + goto out; + + return 0; + +out: + vdev->priv = NULL; + return ret; +} + +static void virtio_fs_remove(struct virtio_device *vdev) +{ + struct virtio_fs *fs = vdev->priv; + + vdev->config->reset(vdev); + + mutex_lock(&virtio_fs_mutex); + list_del(&fs->list); + mutex_unlock(&virtio_fs_mutex); + + vdev->priv = NULL; +} + +#ifdef CONFIG_PM +static int virtio_fs_freeze(struct virtio_device *vdev) +{ + return 0; /* TODO */ +} + +static int virtio_fs_restore(struct virtio_device *vdev) +{ + return 0; /* TODO */ +} +#endif /* CONFIG_PM */ + +const static struct virtio_device_id id_table[] = { + { VIRTIO_ID_FS, VIRTIO_DEV_ANY_ID }, + {}, +}; + +const static unsigned int feature_table[] = {}; + +static struct virtio_driver virtio_fs_driver = { + .driver.name = KBUILD_MODNAME, + .driver.owner = THIS_MODULE, + .id_table = id_table, + .feature_table = feature_table, + .feature_table_size = ARRAY_SIZE(feature_table), + /* TODO validate config_get != NULL */ + .probe = virtio_fs_probe, + .remove = virtio_fs_remove, +#ifdef CONFIG_PM_SLEEP + .freeze = virtio_fs_freeze, + .restore = virtio_fs_restore, +#endif +}; static struct file_system_type virtio_fs_type = { .owner = THIS_MODULE, @@ -21,13 +147,31 @@ static struct file_system_type virtio_fs_type = { static int __init virtio_fs_init(void) { - return register_filesystem(&virtio_fs_type); + int ret; + + ret = register_virtio_driver(&virtio_fs_driver); + if (ret < 0) + return ret; + + ret = register_filesystem(&virtio_fs_type); + if (ret < 0) { + unregister_virtio_driver(&virtio_fs_driver); + return ret; + } + + return 0; } +module_init(virtio_fs_init); static void __exit virtio_fs_exit(void) { unregister_filesystem(&virtio_fs_type); + unregister_virtio_driver(&virtio_fs_driver); } - -module_init(virtio_fs_init); module_exit(virtio_fs_exit); + +MODULE_AUTHOR("Stefan Hajnoczi "); +MODULE_DESCRIPTION("Virtio Filesystem"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_FS(KBUILD_MODNAME); +MODULE_DEVICE_TABLE(virtio, id_table); diff --git a/include/uapi/linux/virtio_fs.h b/include/uapi/linux/virtio_fs.h new file mode 100644 index 000000000000..48f3590dcfbe --- /dev/null +++ b/include/uapi/linux/virtio_fs.h @@ -0,0 +1,41 @@ +#ifndef _UAPI_LINUX_VIRTIO_FS_H +#define _UAPI_LINUX_VIRTIO_FS_H +/* This header is BSD licensed so anyone can use the definitions to implement + * compatible drivers/servers. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of IBM nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. */ +#include +#include +#include +#include + +struct virtio_fs_config { + /* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */ + __u8 tag[36]; + + /* Number of request queues */ + __u32 num_queues; +} __attribute__((packed)); + +#endif /* _UAPI_LINUX_VIRTIO_FS_H */ diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h index 6d5c3b2d4f4d..884b0e2734bb 100644 --- a/include/uapi/linux/virtio_ids.h +++ b/include/uapi/linux/virtio_ids.h @@ -43,5 +43,6 @@ #define VIRTIO_ID_INPUT 18 /* virtio input */ #define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */ #define VIRTIO_ID_CRYPTO 20 /* virtio crypto */ +#define VIRTIO_ID_FS 26 /* virtio filesystem */ #endif /* _LINUX_VIRTIO_IDS_H */ From patchwork Mon Dec 10 17:12:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10EB214E2 for ; Mon, 10 Dec 2018 17:21:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DE1182AF9E for ; Mon, 10 Dec 2018 17:21:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D220A2AFD6; Mon, 10 Dec 2018 17:21:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 895C82AF9E for ; Mon, 10 Dec 2018 17:21:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728296AbeLJRNd (ORCPT ); Mon, 10 Dec 2018 12:13:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54125 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727356AbeLJRNd (ORCPT ); Mon, 10 Dec 2018 12:13:33 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0A2823154848; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 72AC9600D6; Mon, 10 Dec 2018 17:13:30 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 0F31A223BB9; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 03/52] fuse: rely on mutex_unlock() barrier instead of fput() Date: Mon, 10 Dec 2018 12:12:29 -0500 Message-Id: <20181210171318.16998-4-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi fput() will be moved out of this function in a later patch, so we cannot rely on it as the memory barrier for ensuring file->private_data = fud is visible. Luckily there is a mutex_unlock() right before fput() which provides the same effect. Signed-off-by: Stefan Hajnoczi --- fs/fuse/inode.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 0b94b23b02d4..d08cd8bf7705 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1198,12 +1198,11 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) list_add_tail(&fc->entry, &fuse_conn_list); sb->s_root = root_dentry; file->private_data = fud; - mutex_unlock(&fuse_mutex); /* - * atomic_dec_and_test() in fput() provides the necessary - * memory barrier for file->private_data to be visible on all - * CPUs after this + * mutex_unlock() provides the necessary memory barrier for + * file->private_data to be visible on all CPUs after this */ + mutex_unlock(&fuse_mutex); fput(file); fuse_send_init(fc, init_req); From patchwork Mon Dec 10 17:12:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 187E015A6 for ; Mon, 10 Dec 2018 17:21:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED3BC2AF85 for ; Mon, 10 Dec 2018 17:21:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E03C92AF94; Mon, 10 Dec 2018 17:21:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B8372AFB2 for ; Mon, 10 Dec 2018 17:21:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728475AbeLJRVF (ORCPT ); Mon, 10 Dec 2018 12:21:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33696 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728002AbeLJRNd (ORCPT ); Mon, 10 Dec 2018 12:13:33 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 19FBE308213A; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 75E236012B; Mon, 10 Dec 2018 17:13:30 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 1280E223BBA; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 04/52] fuse: extract fuse_fill_super_common() Date: Mon, 10 Dec 2018 12:12:30 -0500 Message-Id: <20181210171318.16998-5-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi fuse_fill_super() includes code to process the fd= option and link the struct fuse_dev to the fd's struct file. In virtio-fs there is no file descriptor because /dev/fuse is not used. This patch extracts fuse_fill_super_common() so that both classic fuse and virtio-fs can share the code to initialize a mount. parse_fuse_opt() is also extracted so that the fuse_fill_super_common() caller has access to the mount options. This allows classic fuse to handle the fd= option outside fuse_fill_super_common(). Signed-off-by: Stefan Hajnoczi --- fs/fuse/fuse_i.h | 32 +++++++++++++++++ fs/fuse/inode.c | 102 +++++++++++++++++++++++++++---------------------------- 2 files changed, 83 insertions(+), 51 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e9f712e81c7d..9b5b8b194f77 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -56,6 +56,22 @@ extern struct mutex fuse_mutex; extern unsigned max_user_bgreq; extern unsigned max_user_congthresh; +/** Mount options */ +struct fuse_mount_data { + int fd; + unsigned rootmode; + kuid_t user_id; + kgid_t group_id; + unsigned fd_present:1; + unsigned rootmode_present:1; + unsigned user_id_present:1; + unsigned group_id_present:1; + unsigned default_permissions:1; + unsigned allow_other:1; + unsigned max_read; + unsigned blksize; +}; + /* One forget request */ struct fuse_forget_link { struct fuse_forget_one forget_one; @@ -970,6 +986,22 @@ struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc); void fuse_dev_free(struct fuse_dev *fud); /** + * Parse a mount options string + */ +int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, + struct user_namespace *user_ns); + +/** + * Fill in superblock and initialize fuse connection + * @sb: partially-initialized superblock to fill in + * @mount_data: mount parameters + * @fudptr: fuse_dev pointer to fill in, should contain NULL on entry + */ +int fuse_fill_super_common(struct super_block *sb, + struct fuse_mount_data *mount_data, + void **fudptr); + +/** * Add connection to control filesystem */ int fuse_ctl_add_conn(struct fuse_conn *fc); diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index d08cd8bf7705..f13133f0ebd1 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -59,21 +59,6 @@ MODULE_PARM_DESC(max_user_congthresh, /** Congestion starts at 75% of maximum */ #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND * 3 / 4) -struct fuse_mount_data { - int fd; - unsigned rootmode; - kuid_t user_id; - kgid_t group_id; - unsigned fd_present:1; - unsigned rootmode_present:1; - unsigned user_id_present:1; - unsigned group_id_present:1; - unsigned default_permissions:1; - unsigned allow_other:1; - unsigned max_read; - unsigned blksize; -}; - struct fuse_forget_link *fuse_alloc_forget(void) { return kzalloc(sizeof(struct fuse_forget_link), GFP_KERNEL); @@ -479,7 +464,7 @@ static int fuse_match_uint(substring_t *s, unsigned int *res) return err; } -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, +int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, struct user_namespace *user_ns) { char *p; @@ -556,12 +541,13 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, } } - if (!d->fd_present || !d->rootmode_present || - !d->user_id_present || !d->group_id_present) + if (!d->rootmode_present || !d->user_id_present || + !d->group_id_present) return 0; return 1; } +EXPORT_SYMBOL_GPL(parse_fuse_opt); static int fuse_show_options(struct seq_file *m, struct dentry *root) { @@ -1072,13 +1058,13 @@ void fuse_dev_free(struct fuse_dev *fud) } EXPORT_SYMBOL_GPL(fuse_dev_free); -static int fuse_fill_super(struct super_block *sb, void *data, int silent) +int fuse_fill_super_common(struct super_block *sb, + struct fuse_mount_data *mount_data, + void **fudptr) { struct fuse_dev *fud; struct fuse_conn *fc; struct inode *root; - struct fuse_mount_data d; - struct file *file; struct dentry *root_dentry; struct fuse_req *init_req; int err; @@ -1090,13 +1076,10 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) sb->s_flags &= ~(SB_NOSEC | SB_I_VERSION); - if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) - goto err; - if (is_bdev) { #ifdef CONFIG_BLOCK err = -EINVAL; - if (!sb_set_blocksize(sb, d.blksize)) + if (!sb_set_blocksize(sb, mount_data->blksize)) goto err; #endif } else { @@ -1113,19 +1096,6 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) if (sb->s_user_ns != &init_user_ns) sb->s_iflags |= SB_I_UNTRUSTED_MOUNTER; - file = fget(d.fd); - err = -EINVAL; - if (!file) - goto err; - - /* - * Require mount to happen from the same user namespace which - * opened /dev/fuse to prevent potential attacks. - */ - if (file->f_op != &fuse_dev_operations || - file->f_cred->user_ns != sb->s_user_ns) - goto err_fput; - /* * If we are not in the initial user namespace posix * acls must be translated. @@ -1136,7 +1106,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) fc = kmalloc(sizeof(*fc), GFP_KERNEL); err = -ENOMEM; if (!fc) - goto err_fput; + goto err; fuse_conn_init(fc, sb->s_user_ns); fc->release = fuse_free_conn; @@ -1156,18 +1126,18 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) fc->dont_mask = 1; sb->s_flags |= SB_POSIXACL; - fc->default_permissions = d.default_permissions; - fc->allow_other = d.allow_other; - fc->user_id = d.user_id; - fc->group_id = d.group_id; - fc->max_read = max_t(unsigned, 4096, d.max_read); + fc->default_permissions = mount_data->default_permissions; + fc->allow_other = mount_data->allow_other; + fc->user_id = mount_data->user_id; + fc->group_id = mount_data->group_id; + fc->max_read = max_t(unsigned, 4096, mount_data->max_read); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; /* Used by get_root_inode() */ sb->s_fs_info = fc; err = -ENOMEM; - root = fuse_get_root_inode(sb, d.rootmode); + root = fuse_get_root_inode(sb, mount_data->rootmode); sb->s_d_op = &fuse_root_dentry_operations; root_dentry = d_make_root(root); if (!root_dentry) @@ -1188,7 +1158,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) mutex_lock(&fuse_mutex); err = -EINVAL; - if (file->private_data) + if (*fudptr) goto err_unlock; err = fuse_ctl_add_conn(fc); @@ -1197,13 +1167,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) list_add_tail(&fc->entry, &fuse_conn_list); sb->s_root = root_dentry; - file->private_data = fud; + *fudptr = fud; /* * mutex_unlock() provides the necessary memory barrier for - * file->private_data to be visible on all CPUs after this + * *fudptr to be visible on all CPUs after this */ mutex_unlock(&fuse_mutex); - fput(file); fuse_send_init(fc, init_req); @@ -1220,11 +1189,42 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) err_put_conn: fuse_conn_put(fc); sb->s_fs_info = NULL; - err_fput: - fput(file); err: return err; } +EXPORT_SYMBOL_GPL(fuse_fill_super_common); + +static int fuse_fill_super(struct super_block *sb, void *data, int silent) +{ + struct fuse_mount_data d; + struct file *file; + int is_bdev = sb->s_bdev != NULL; + int err; + + err = -EINVAL; + if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) + goto err; + if (!d.fd_present) + goto err; + + file = fget(d.fd); + if (!file) + goto err; + + /* + * Require mount to happen from the same user namespace which + * opened /dev/fuse to prevent potential attacks. + */ + if ((file->f_op != &fuse_dev_operations) || + (file->f_cred->user_ns != sb->s_user_ns)) + goto err_fput; + + err = fuse_fill_super_common(sb, &d, &file->private_data); +err_fput: + fput(file); +err: + return err; +} static struct dentry *fuse_mount(struct file_system_type *fs_type, int flags, const char *dev_name, From patchwork Mon Dec 10 17:12:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E097F13AF for ; Mon, 10 Dec 2018 17:13:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C60132A796 for ; Mon, 10 Dec 2018 17:13:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B9E362A83B; Mon, 10 Dec 2018 17:13:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17C9B2A796 for ; Mon, 10 Dec 2018 17:13:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728594AbeLJRNo (ORCPT ); Mon, 10 Dec 2018 12:13:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45192 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728552AbeLJRNo (ORCPT ); Mon, 10 Dec 2018 12:13:44 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3D831368E7; Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 734325C221; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 167CE223BE8; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 05/52] virtio_fs: get mount working Date: Mon, 10 Dec 2018 12:12:31 -0500 Message-Id: <20181210171318.16998-6-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Provide definitions of ->mount and ->kill_sb. This is still WIP. Signed-off-by: Stefan Hajnoczi --- fs/fuse/fuse_i.h | 9 ++++ fs/fuse/inode.c | 12 ++++- fs/fuse/virtio_fs.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 146 insertions(+), 4 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 9b5b8b194f77..4fea75c92a7c 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -59,10 +59,12 @@ extern unsigned max_user_congthresh; /** Mount options */ struct fuse_mount_data { int fd; + const char *tag; /* lifetime: .fill_super() data argument */ unsigned rootmode; kuid_t user_id; kgid_t group_id; unsigned fd_present:1; + unsigned tag_present:1; unsigned rootmode_present:1; unsigned user_id_present:1; unsigned group_id_present:1; @@ -1002,6 +1004,13 @@ int fuse_fill_super_common(struct super_block *sb, void **fudptr); /** + * Disassociate fuse connection from superblock and kill the superblock + * + * Calls kill_anon_super(), use with do not use with bdev mounts. + */ +void fuse_kill_sb_anon(struct super_block *sb); + +/** * Add connection to control filesystem */ int fuse_ctl_add_conn(struct fuse_conn *fc); diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index f13133f0ebd1..65fd59fc1e81 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -431,6 +431,7 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) enum { OPT_FD, + OPT_TAG, OPT_ROOTMODE, OPT_USER_ID, OPT_GROUP_ID, @@ -443,6 +444,7 @@ enum { static const match_table_t tokens = { {OPT_FD, "fd=%u"}, + {OPT_TAG, "tag=%s"}, {OPT_ROOTMODE, "rootmode=%o"}, {OPT_USER_ID, "user_id=%u"}, {OPT_GROUP_ID, "group_id=%u"}, @@ -489,6 +491,11 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, d->fd_present = 1; break; + case OPT_TAG: + d->tag = args[0].from; + d->tag_present = 1; + break; + case OPT_ROOTMODE: if (match_octal(&args[0], &value)) return 0; @@ -1204,7 +1211,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) err = -EINVAL; if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) goto err; - if (!d.fd_present) + if (!d.fd_present || d.tag_present) goto err; file = fget(d.fd); @@ -1249,11 +1256,12 @@ static void fuse_sb_destroy(struct super_block *sb) } } -static void fuse_kill_sb_anon(struct super_block *sb) +void fuse_kill_sb_anon(struct super_block *sb) { fuse_sb_destroy(sb); kill_anon_super(sb); } +EXPORT_SYMBOL_GPL(fuse_kill_sb_anon); static struct file_system_type fuse_fs_type = { .owner = THIS_MODULE, diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index aac9c3c42827..8cdeb02f3778 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -8,6 +8,7 @@ #include #include #include +#include "fuse_i.h" /* List of virtio-fs device instances and a lock for the list */ static DEFINE_MUTEX(virtio_fs_mutex); @@ -17,6 +18,8 @@ static LIST_HEAD(virtio_fs_instances); struct virtio_fs { struct list_head list; /* on virtio_fs_instances */ char *tag; + struct fuse_dev **fud; /* 1:1 mapping with request queues */ + unsigned int num_queues; }; /* Add a new instance to the list or return -EEXIST if tag name exists*/ @@ -42,6 +45,46 @@ static int virtio_fs_add_instance(struct virtio_fs *fs) return 0; } +/* Return the virtio_fs with a given tag, or NULL */ +static struct virtio_fs *virtio_fs_find_instance(const char *tag) +{ + struct virtio_fs *fs; + + mutex_lock(&virtio_fs_mutex); + + list_for_each_entry(fs, &virtio_fs_instances, list) { + if (strcmp(fs->tag, tag) == 0) + goto found; + } + + fs = NULL; /* not found */ + +found: + mutex_unlock(&virtio_fs_mutex); + + return fs; +} + +static void virtio_fs_free_devs(struct virtio_fs *fs) +{ + unsigned int i; + + /* TODO lock */ + + if (!fs->fud) + return; + + for (i = 0; i < fs->num_queues; i++) { + struct fuse_dev *fud = fs->fud[i]; + + if (fud) + fuse_dev_free(fud); /* TODO need to quiesce/end_requests/decrement dev_count */ + } + + kfree(fs->fud); + fs->fud = NULL; +} + /* Read filesystem name from virtio config into fs->tag (must kfree()). */ static int virtio_fs_read_tag(struct virtio_device *vdev, struct virtio_fs *fs) { @@ -76,6 +119,13 @@ static int virtio_fs_probe(struct virtio_device *vdev) return -ENOMEM; vdev->priv = fs; + virtio_cread(vdev, struct virtio_fs_config, num_queues, + &fs->num_queues); + if (fs->num_queues == 0) { + ret = -EINVAL; + goto out; + } + ret = virtio_fs_read_tag(vdev, fs); if (ret < 0) goto out; @@ -95,6 +145,8 @@ static void virtio_fs_remove(struct virtio_device *vdev) { struct virtio_fs *fs = vdev->priv; + virtio_fs_free_devs(fs); + vdev->config->reset(vdev); mutex_lock(&virtio_fs_mutex); @@ -138,11 +190,84 @@ static struct virtio_driver virtio_fs_driver = { #endif }; +static int virtio_fs_fill_super(struct super_block *sb, void *data, + int silent) +{ + struct fuse_mount_data d; + struct fuse_conn *fc; + struct virtio_fs *fs; + int is_bdev = sb->s_bdev != NULL; + unsigned int i; + int err; + + err = -EINVAL; + if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) + goto err; + if (d.fd_present) { + printk(KERN_ERR "virtio-fs: fd option cannot be used\n"); + goto err; + } + if (!d.tag_present) { + printk(KERN_ERR "virtio-fs: missing tag option\n"); + goto err; + } + + fs = virtio_fs_find_instance(d.tag); + if (!fs) { + printk(KERN_ERR "virtio-fs: tag not found\n"); + err = -ENOENT; + goto err; + } + + /* TODO lock */ + if (fs->fud) { + printk(KERN_ERR "virtio-fs: device already in use\n"); + err = -EBUSY; + goto err; + } + fs->fud = kcalloc(fs->num_queues, sizeof(fs->fud[0]), GFP_KERNEL); + if (!fs->fud) { + err = -ENOMEM; + goto err_fud; + } + + err = fuse_fill_super_common(sb, &d, (void **)&fs->fud[0]); + if (err < 0) + goto err_fud; + + fc = fs->fud[0]->fc; + + /* Allocate remaining fuse_devs */ + err = -ENOMEM; + /* TODO take fuse_mutex around this loop? */ + for (i = 1; i < fs->num_queues; i++) { + fs->fud[i] = fuse_dev_alloc(fc); + if (!fs->fud[i]) { + /* TODO */ + } + atomic_inc(&fc->dev_count); + } + + return 0; + +err_fud: + virtio_fs_free_devs(fs); +err: + return err; +} + +static struct dentry *virtio_fs_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, + void *raw_data) +{ + return mount_nodev(fs_type, flags, raw_data, virtio_fs_fill_super); +} + static struct file_system_type virtio_fs_type = { .owner = THIS_MODULE, .name = KBUILD_MODNAME, - .mount = NULL, - .kill_sb = NULL, + .mount = virtio_fs_mount, + .kill_sb = fuse_kill_sb_anon, }; static int __init virtio_fs_init(void) From patchwork Mon Dec 10 17:12:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721981 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07D6615A6 for ; Mon, 10 Dec 2018 17:20:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCE1F2AF60 for ; Mon, 10 Dec 2018 17:20:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D149D2AF91; Mon, 10 Dec 2018 17:20:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A88B2AF60 for ; Mon, 10 Dec 2018 17:20:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728629AbeLJRUb (ORCPT ); Mon, 10 Dec 2018 12:20:31 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59348 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728323AbeLJRNe (ORCPT ); Mon, 10 Dec 2018 12:13:34 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B66E9C059B9E; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6E9BA1001914; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 19D4E223BEA; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 06/52] fuse: export fuse_end_request() Date: Mon, 10 Dec 2018 12:12:32 -0500 Message-Id: <20181210171318.16998-7-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi virtio-fs will need to complete requests from outside fs/fuse/dev.c. Make the symbol visible. Signed-off-by: Stefan Hajnoczi --- fs/fuse/dev.c | 19 ++++++++++--------- fs/fuse/fuse_i.h | 5 +++++ 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index a5e516a40e7a..5b90c839a7c3 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -425,7 +425,7 @@ static void flush_bg_queue(struct fuse_conn *fc) * the 'end' callback is called if given, else the reference to the * request is released */ -static void request_end(struct fuse_conn *fc, struct fuse_req *req) +void fuse_request_end(struct fuse_conn *fc, struct fuse_req *req) { struct fuse_iqueue *fiq = &fc->iq; @@ -469,6 +469,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) put_request: fuse_put_request(fc, req); } +EXPORT_SYMBOL_GPL(fuse_request_end); static void queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) { @@ -543,12 +544,12 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req) req->in.h.unique = fuse_get_unique(fiq); queue_request(fiq, req); /* acquire extra reference, since request is still needed - after request_end() */ + after fuse_request_end() */ __fuse_get_request(req); spin_unlock(&fiq->waitq.lock); request_wait_answer(fc, req); - /* Pairs with smp_wmb() in request_end() */ + /* Pairs with smp_wmb() in fuse_request_end() */ smp_rmb(); } } @@ -1278,7 +1279,7 @@ __releases(fiq->waitq.lock) * the pending list and copies request data to userspace buffer. If * no reply is needed (FORGET) or request has been aborted or there * was an error during the copying then it's finished by calling - * request_end(). Otherwise add it to the processing list, and set + * fuse_request_end(). Otherwise add it to the processing list, and set * the 'sent' flag. */ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, @@ -1338,7 +1339,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, /* SETXATTR is special, since it may contain too large data */ if (in->h.opcode == FUSE_SETXATTR) req->out.h.error = -E2BIG; - request_end(fc, req); + fuse_request_end(fc, req); goto restart; } spin_lock(&fpq->lock); @@ -1381,7 +1382,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, if (!test_bit(FR_PRIVATE, &req->flags)) list_del_init(&req->list); spin_unlock(&fpq->lock); - request_end(fc, req); + fuse_request_end(fc, req); return err; err_unlock: @@ -1889,7 +1890,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_out *out, * the write buffer. The request is then searched on the processing * list by the unique ID found in the header. If found, then remove * it from the list and copy the rest of the buffer to the request. - * The request is finished by calling request_end() + * The request is finished by calling fuse_request_end(). */ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, struct fuse_copy_state *cs, size_t nbytes) @@ -1976,7 +1977,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, list_del_init(&req->list); spin_unlock(&fpq->lock); - request_end(fc, req); + fuse_request_end(fc, req); return err ? err : nbytes; @@ -2120,7 +2121,7 @@ static void end_requests(struct fuse_conn *fc, struct list_head *head) req->out.h.error = -ECONNABORTED; clear_bit(FR_SENT, &req->flags); list_del_init(&req->list); - request_end(fc, req); + fuse_request_end(fc, req); } } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 4fea75c92a7c..32c4466a8f89 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -953,6 +953,11 @@ ssize_t fuse_simple_request(struct fuse_conn *fc, struct fuse_args *args); void fuse_request_send_background(struct fuse_conn *fc, struct fuse_req *req); bool fuse_request_queue_background(struct fuse_conn *fc, struct fuse_req *req); +/** + * End a finished request + */ +void fuse_request_end(struct fuse_conn *fc, struct fuse_req *req); + /* Abort all requests */ void fuse_abort_conn(struct fuse_conn *fc, bool is_abort); void fuse_wait_aborted(struct fuse_conn *fc); From patchwork Mon Dec 10 17:12:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721805 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 039D414E2 for ; Mon, 10 Dec 2018 17:14:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF3622AF02 for ; Mon, 10 Dec 2018 17:14:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D355D2AF37; Mon, 10 Dec 2018 17:14:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89A6F2AF02 for ; Mon, 10 Dec 2018 17:14:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728558AbeLJROr (ORCPT ); Mon, 10 Dec 2018 12:14:47 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52460 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728571AbeLJRNo (ORCPT ); Mon, 10 Dec 2018 12:13:44 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 57A1581DF0; Mon, 10 Dec 2018 17:13:44 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8600F5FCA2; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 2069A223BF8; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 07/52] fuse: export fuse_len_args() Date: Mon, 10 Dec 2018 12:12:33 -0500 Message-Id: <20181210171318.16998-8-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 10 Dec 2018 17:13:44 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi virtio-fs will need to query the length of fuse_arg lists. Make the symbol visible. Signed-off-by: Stefan Hajnoczi --- fs/fuse/dev.c | 7 ++++--- fs/fuse/fuse_i.h | 5 +++++ 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 5b90c839a7c3..7fd627d5cf58 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -348,7 +348,7 @@ void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req) } EXPORT_SYMBOL_GPL(fuse_put_request); -static unsigned len_args(unsigned numargs, struct fuse_arg *args) +unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args) { unsigned nbytes = 0; unsigned i; @@ -358,6 +358,7 @@ static unsigned len_args(unsigned numargs, struct fuse_arg *args) return nbytes; } +EXPORT_SYMBOL_GPL(fuse_len_args); static u64 fuse_get_unique(struct fuse_iqueue *fiq) { @@ -373,7 +374,7 @@ static unsigned int fuse_req_hash(u64 unique) static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req) { req->in.h.len = sizeof(struct fuse_in_header) + - len_args(req->in.numargs, (struct fuse_arg *) req->in.args); + fuse_len_args(req->in.numargs, (struct fuse_arg *) req->in.args); list_add_tail(&req->list, &fiq->pending); wake_up_locked(&fiq->waitq); kill_fasync(&fiq->fasync, SIGIO, POLL_IN); @@ -1870,7 +1871,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_out *out, if (out->h.error) return nbytes != reqsize ? -EINVAL : 0; - reqsize += len_args(out->numargs, out->args); + reqsize += fuse_len_args(out->numargs, out->args); if (reqsize < nbytes || (reqsize > nbytes && !out->argvar)) return -EINVAL; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 32c4466a8f89..f41ebc723e01 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1120,4 +1120,9 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, int type); /* readdir.c */ int fuse_readdir(struct file *file, struct dir_context *ctx); +/** + * Return the number of bytes in an arguments list + */ +unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); + #endif /* _FS_FUSE_I_H */ From patchwork Mon Dec 10 17:12:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721819 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 91F1F15A6 for ; Mon, 10 Dec 2018 17:15:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76EBE2AF45 for ; Mon, 10 Dec 2018 17:15:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6AADF2AF42; Mon, 10 Dec 2018 17:15:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ABEF32AF42 for ; Mon, 10 Dec 2018 17:15:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728634AbeLJRPV (ORCPT ); Mon, 10 Dec 2018 12:15:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:32934 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbeLJRNk (ORCPT ); Mon, 10 Dec 2018 12:13:40 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E9EE030020B2; Mon, 10 Dec 2018 17:13:39 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 800B619936; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 23B5A223BF9; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 08/52] fuse: add fuse_iqueue_ops callbacks Date: Mon, 10 Dec 2018 12:12:34 -0500 Message-Id: <20181210171318.16998-9-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 10 Dec 2018 17:13:40 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi The /dev/fuse device uses fiq->waitq and fasync to signal that requests are available. These mechanisms do not apply to virtio-fs. This patch introduces callbacks so alternative behavior can be used. Note that queue_interrupt() changes along these lines: spin_lock(&fiq->waitq.lock); wake_up_locked(&fiq->waitq); + kill_fasync(&fiq->fasync, SIGIO, POLL_IN); spin_unlock(&fiq->waitq.lock); - kill_fasync(&fiq->fasync, SIGIO, POLL_IN); Since queue_request() and queue_forget() also call kill_fasync() inside the spinlock this should be safe. Signed-off-by: Stefan Hajnoczi --- fs/fuse/cuse.c | 2 +- fs/fuse/dev.c | 50 ++++++++++++++++++++++++++++++++++---------------- fs/fuse/fuse_i.h | 46 +++++++++++++++++++++++++++++++++++++++++++++- fs/fuse/inode.c | 18 +++++++++++++----- 4 files changed, 93 insertions(+), 23 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 8f68181256c0..98dc780cbafa 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -503,7 +503,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file) * Limit the cuse channel to requests that can * be represented in file->f_cred->user_ns. */ - fuse_conn_init(&cc->fc, file->f_cred->user_ns); + fuse_conn_init(&cc->fc, file->f_cred->user_ns, &fuse_dev_fiq_ops, NULL); fud = fuse_dev_alloc(&cc->fc); if (!fud) { diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 7fd627d5cf58..b26ee5ed8974 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -371,13 +371,33 @@ static unsigned int fuse_req_hash(u64 unique) return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS); } -static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req) +/** + * A new request is available, wake fiq->waitq + */ +static void fuse_dev_wake_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) { - req->in.h.len = sizeof(struct fuse_in_header) + - fuse_len_args(req->in.numargs, (struct fuse_arg *) req->in.args); - list_add_tail(&req->list, &fiq->pending); wake_up_locked(&fiq->waitq); kill_fasync(&fiq->fasync, SIGIO, POLL_IN); + spin_unlock(&fiq->waitq.lock); +} + +const struct fuse_iqueue_ops fuse_dev_fiq_ops = { + .wake_forget_and_unlock = fuse_dev_wake_and_unlock, + .wake_interrupt_and_unlock = fuse_dev_wake_and_unlock, + .wake_pending_and_unlock = fuse_dev_wake_and_unlock, +}; +EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops); + +static void queue_request_and_unlock(struct fuse_iqueue *fiq, + struct fuse_req *req) +__releases(fiq->waitq.lock) +{ + req->in.h.len = sizeof(struct fuse_in_header) + + fuse_len_args(req->in.numargs, + (struct fuse_arg *) req->in.args); + list_add_tail(&req->list, &fiq->pending); + fiq->ops->wake_pending_and_unlock(fiq); } void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, @@ -392,12 +412,11 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, if (fiq->connected) { fiq->forget_list_tail->next = forget; fiq->forget_list_tail = forget; - wake_up_locked(&fiq->waitq); - kill_fasync(&fiq->fasync, SIGIO, POLL_IN); + fiq->ops->wake_forget_and_unlock(fiq); } else { kfree(forget); + spin_unlock(&fiq->waitq.lock); } - spin_unlock(&fiq->waitq.lock); } static void flush_bg_queue(struct fuse_conn *fc) @@ -413,8 +432,7 @@ static void flush_bg_queue(struct fuse_conn *fc) fc->active_background++; spin_lock(&fiq->waitq.lock); req->in.h.unique = fuse_get_unique(fiq); - queue_request(fiq, req); - spin_unlock(&fiq->waitq.lock); + queue_request_and_unlock(fiq, req); } } @@ -481,10 +499,10 @@ static void queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) } if (list_empty(&req->intr_entry)) { list_add_tail(&req->intr_entry, &fiq->interrupts); - wake_up_locked(&fiq->waitq); + fiq->ops->wake_interrupt_and_unlock(fiq); + } else { + spin_unlock(&fiq->waitq.lock); } - spin_unlock(&fiq->waitq.lock); - kill_fasync(&fiq->fasync, SIGIO, POLL_IN); } static void request_wait_answer(struct fuse_conn *fc, struct fuse_req *req) @@ -543,11 +561,10 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req) req->out.h.error = -ENOTCONN; } else { req->in.h.unique = fuse_get_unique(fiq); - queue_request(fiq, req); /* acquire extra reference, since request is still needed after fuse_request_end() */ __fuse_get_request(req); - spin_unlock(&fiq->waitq.lock); + queue_request_and_unlock(fiq, req); request_wait_answer(fc, req); /* Pairs with smp_wmb() in fuse_request_end() */ @@ -680,10 +697,11 @@ static int fuse_request_send_notify_reply(struct fuse_conn *fc, req->in.h.unique = unique; spin_lock(&fiq->waitq.lock); if (fiq->connected) { - queue_request(fiq, req); + queue_request_and_unlock(fiq, req); err = 0; + } else { + spin_unlock(&fiq->waitq.lock); } - spin_unlock(&fiq->waitq.lock); return err; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f41ebc723e01..60ebe3c2e2c3 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -454,6 +454,39 @@ struct fuse_req { struct file *stolen_file; }; +struct fuse_iqueue; + +/** + * Input queue callbacks + * + * Input queue signalling is device-specific. For example, the /dev/fuse file + * uses fiq->waitq and fasync to wake processes that are waiting on queue + * readiness. These callbacks allow other device types to respond to input + * queue activity. + */ +struct fuse_iqueue_ops { + /** + * Signal that a forget has been queued + */ + void (*wake_forget_and_unlock)(struct fuse_iqueue *fiq) + __releases(fiq->waitq.lock); + + /** + * Signal that an INTERRUPT request has been queued + */ + void (*wake_interrupt_and_unlock)(struct fuse_iqueue *fiq) + __releases(fiq->waitq.lock); + + /** + * Signal that a request has been queued + */ + void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq) + __releases(fiq->waitq.lock); +}; + +/** /dev/fuse input queue operations */ +extern const struct fuse_iqueue_ops fuse_dev_fiq_ops; + struct fuse_iqueue { /** Connection established */ unsigned connected; @@ -479,6 +512,12 @@ struct fuse_iqueue { /** O_ASYNC requests */ struct fasync_struct *fasync; + + /** Device-specific callbacks */ + const struct fuse_iqueue_ops *ops; + + /** Device-specific state */ + void *priv; }; #define FUSE_PQ_HASH_BITS 8 @@ -982,7 +1021,8 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc); /** * Initialize fuse_conn */ -void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns); +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv); /** * Release reference to fuse_conn @@ -1002,10 +1042,14 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, * Fill in superblock and initialize fuse connection * @sb: partially-initialized superblock to fill in * @mount_data: mount parameters + * @fiq_ops: fuse input queue operations + * @fiq_priv: device-specific state for fuse_iqueue * @fudptr: fuse_dev pointer to fill in, should contain NULL on entry */ int fuse_fill_super_common(struct super_block *sb, struct fuse_mount_data *mount_data, + const struct fuse_iqueue_ops *fiq_ops, + void *fiq_priv, void **fudptr); /** diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 65fd59fc1e81..31bb817575c4 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -574,7 +574,9 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root) return 0; } -static void fuse_iqueue_init(struct fuse_iqueue *fiq) +static void fuse_iqueue_init(struct fuse_iqueue *fiq, + const struct fuse_iqueue_ops *ops, + void *priv) { memset(fiq, 0, sizeof(struct fuse_iqueue)); init_waitqueue_head(&fiq->waitq); @@ -582,6 +584,8 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq) INIT_LIST_HEAD(&fiq->interrupts); fiq->forget_list_tail = &fiq->forget_list_head; fiq->connected = 1; + fiq->ops = ops; + fiq->priv = priv; } static void fuse_pqueue_init(struct fuse_pqueue *fpq) @@ -595,7 +599,8 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq) fpq->connected = 1; } -void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns) +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) { memset(fc, 0, sizeof(*fc)); spin_lock_init(&fc->lock); @@ -605,7 +610,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns) atomic_set(&fc->dev_count, 1); init_waitqueue_head(&fc->blocked_waitq); init_waitqueue_head(&fc->reserved_req_waitq); - fuse_iqueue_init(&fc->iq); + fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv); INIT_LIST_HEAD(&fc->bg_queue); INIT_LIST_HEAD(&fc->entry); INIT_LIST_HEAD(&fc->devices); @@ -1067,6 +1072,8 @@ EXPORT_SYMBOL_GPL(fuse_dev_free); int fuse_fill_super_common(struct super_block *sb, struct fuse_mount_data *mount_data, + const struct fuse_iqueue_ops *fiq_ops, + void *fiq_priv, void **fudptr) { struct fuse_dev *fud; @@ -1115,7 +1122,7 @@ int fuse_fill_super_common(struct super_block *sb, if (!fc) goto err; - fuse_conn_init(fc, sb->s_user_ns); + fuse_conn_init(fc, sb->s_user_ns, fiq_ops, fiq_priv); fc->release = fuse_free_conn; fud = fuse_dev_alloc(fc); @@ -1226,7 +1233,8 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) (file->f_cred->user_ns != sb->s_user_ns)) goto err_fput; - err = fuse_fill_super_common(sb, &d, &file->private_data); + err = fuse_fill_super_common(sb, &d, &fuse_dev_fiq_ops, NULL, + &file->private_data); err_fput: fput(file); err: From patchwork Mon Dec 10 17:12:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721929 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C2F0514E2 for ; Mon, 10 Dec 2018 17:19:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A41022A5AF for ; Mon, 10 Dec 2018 17:19:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 97B672A5E7; Mon, 10 Dec 2018 17:19:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 822582A5AF for ; Mon, 10 Dec 2018 17:19:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729053AbeLJRSf (ORCPT ); Mon, 10 Dec 2018 12:18:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47756 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728351AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3928730A694D; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id CD889600D6; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 284A8223C00; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 09/52] fuse: process requests queues Date: Mon, 10 Dec 2018 12:12:35 -0500 Message-Id: <20181210171318.16998-10-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Send normal requests to the device and handle completions. This is enough to get mount and basic I/O working. The hiprio and notifications queues still need to be implemented for full FUSE functionality. Signed-off-by: Stefan Hajnoczi --- fs/fuse/fuse_i.h | 3 + fs/fuse/virtio_fs.c | 529 +++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 501 insertions(+), 31 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 60ebe3c2e2c3..3a91aa970566 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -452,6 +452,9 @@ struct fuse_req { /** Request is stolen from fuse_file->reserved_req */ struct file *stolen_file; + + /** virtio-fs's physically contiguous buffer for in and out args */ + void *argbuf; }; struct fuse_iqueue; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 8cdeb02f3778..fa99a31ee930 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -14,14 +14,35 @@ static DEFINE_MUTEX(virtio_fs_mutex); static LIST_HEAD(virtio_fs_instances); +/* Per-virtqueue state */ +struct virtio_fs_vq { + struct virtqueue *vq; /* protected by fpq->lock */ + struct work_struct done_work; + struct fuse_dev *fud; + char name[24]; +} ____cacheline_aligned_in_smp; + /* A virtio-fs device instance */ struct virtio_fs { - struct list_head list; /* on virtio_fs_instances */ + struct list_head list; /* on virtio_fs_instances */ char *tag; - struct fuse_dev **fud; /* 1:1 mapping with request queues */ - unsigned int num_queues; + struct virtio_fs_vq *vqs; + unsigned nvqs; /* number of virtqueues */ + unsigned num_queues; /* number of request queues */ }; +static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq) +{ + struct virtio_fs *fs = vq->vdev->priv; + + return &fs->vqs[vq->index]; +} + +static inline struct fuse_pqueue *vq_to_fpq(struct virtqueue *vq) +{ + return &vq_to_fsvq(vq)->fud->pq; +} + /* Add a new instance to the list or return -EEXIST if tag name exists*/ static int virtio_fs_add_instance(struct virtio_fs *fs) { @@ -71,18 +92,17 @@ static void virtio_fs_free_devs(struct virtio_fs *fs) /* TODO lock */ - if (!fs->fud) - return; + for (i = 0; i < fs->nvqs; i++) { + struct virtio_fs_vq *fsvq = &fs->vqs[i]; - for (i = 0; i < fs->num_queues; i++) { - struct fuse_dev *fud = fs->fud[i]; + if (!fsvq->fud) + continue; - if (fud) - fuse_dev_free(fud); /* TODO need to quiesce/end_requests/decrement dev_count */ - } + flush_work(&fsvq->done_work); - kfree(fs->fud); - fs->fud = NULL; + fuse_dev_free(fsvq->fud); /* TODO need to quiesce/end_requests/decrement dev_count */ + fsvq->fud = NULL; + } } /* Read filesystem name from virtio config into fs->tag (must kfree()). */ @@ -109,6 +129,210 @@ static int virtio_fs_read_tag(struct virtio_device *vdev, struct virtio_fs *fs) return 0; } +static void virtio_fs_notifications_done(struct virtqueue *vq) +{ + /* TODO */ + dev_dbg(&vq->vdev->dev, "%s\n", __func__); +} + +static void virtio_fs_notifications_done_work(struct work_struct *work) +{ + return; +} + +static void virtio_fs_hiprio_done(struct virtqueue *vq) +{ + /* TODO */ + dev_dbg(&vq->vdev->dev, "%s\n", __func__); +} + +/* Allocate and copy args into req->argbuf */ +static int copy_args_to_argbuf(struct fuse_req *req) +{ + unsigned offset = 0; + unsigned num_in; + unsigned num_out; + unsigned len; + unsigned i; + + num_in = req->in.numargs - req->in.argpages; + num_out = req->out.numargs - req->out.argpages; + len = fuse_len_args(num_in, (struct fuse_arg *)req->in.args) + + fuse_len_args(num_out, req->out.args); + + req->argbuf = kmalloc(len, GFP_ATOMIC); + if (!req->argbuf) + return -ENOMEM; + + for (i = 0; i < num_in; i++) { + memcpy(req->argbuf + offset, + req->in.args[i].value, + req->in.args[i].size); + offset += req->in.args[i].size; + } + + return 0; +} + +/* Copy args out of and free req->argbuf */ +static void copy_args_from_argbuf(struct fuse_req *req) +{ + unsigned remaining; + unsigned offset; + unsigned num_in; + unsigned num_out; + unsigned i; + + remaining = req->out.h.len - sizeof(req->out.h); + num_in = req->in.numargs - req->in.argpages; + num_out = req->out.numargs - req->out.argpages; + offset = fuse_len_args(num_in, (struct fuse_arg *)req->in.args); + + for (i = 0; i < num_out; i++) { + unsigned argsize = req->out.args[i].size; + + if (req->out.argvar && + i == req->out.numargs - 1 && + argsize > remaining) { + argsize = remaining; + } + + memcpy(req->out.args[i].value, req->argbuf + offset, argsize); + offset += argsize; + + if (i != req->out.numargs - 1) + remaining -= argsize; + } + + /* Store the actual size of the variable-length arg */ + if (req->out.argvar) + req->out.args[req->out.numargs - 1].size = remaining; + + kfree(req->argbuf); + req->argbuf = NULL; +} + +/* Work function for request completion */ +static void virtio_fs_requests_done_work(struct work_struct *work) +{ + struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, + done_work); + struct fuse_pqueue *fpq = &fsvq->fud->pq; + struct fuse_conn *fc = fsvq->fud->fc; + struct virtqueue *vq = fsvq->vq; + struct fuse_req *req; + struct fuse_req *next; + LIST_HEAD(reqs); + + /* Collect completed requests off the virtqueue */ + spin_lock(&fpq->lock); + do { + unsigned len; + + virtqueue_disable_cb(vq); + + while ((req = virtqueue_get_buf(vq, &len)) != NULL) + list_move_tail(&req->list, &reqs); + } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); + spin_unlock(&fpq->lock); + + /* End requests */ + list_for_each_entry_safe(req, next, &reqs, list) { + /* TODO check unique */ + /* TODO fuse_len_args(out) against oh.len */ + + copy_args_from_argbuf(req); + + /* TODO zeroing? */ + + spin_lock(&fpq->lock); + clear_bit(FR_SENT, &req->flags); + list_del_init(&req->list); + spin_unlock(&fpq->lock); + + fuse_request_end(fc, req); + } +} + +/* Virtqueue interrupt handler */ +static void virtio_fs_vq_done(struct virtqueue *vq) +{ + struct virtio_fs_vq *fsvq = vq_to_fsvq(vq); + + dev_dbg(&vq->vdev->dev, "%s %s\n", __func__, fsvq->name); + + schedule_work(&fsvq->done_work); +} + +/* Initialize virtqueues */ +static int virtio_fs_setup_vqs(struct virtio_device *vdev, + struct virtio_fs *fs) +{ + struct virtqueue **vqs; + vq_callback_t **callbacks; + const char **names; + unsigned i; + int ret; + + virtio_cread(vdev, struct virtio_fs_config, num_queues, + &fs->num_queues); + if (fs->num_queues == 0) + return -EINVAL; + + fs->nvqs = 2 + fs->num_queues; + + fs->vqs = devm_kcalloc(&vdev->dev, fs->nvqs, sizeof(fs->vqs[0]), + GFP_KERNEL); + if (!fs->vqs) + return -ENOMEM; + + vqs = kmalloc_array(fs->nvqs, sizeof(vqs[0]), GFP_KERNEL); + callbacks = kmalloc_array(fs->nvqs, sizeof(callbacks[0]), GFP_KERNEL); + names = kmalloc_array(fs->nvqs, sizeof(names[0]), GFP_KERNEL); + if (!vqs || !callbacks || !names) { + ret = -ENOMEM; + goto out; + } + + callbacks[0] = virtio_fs_notifications_done; + snprintf(fs->vqs[0].name, sizeof(fs->vqs[0].name), "notifications"); + INIT_WORK(&fs->vqs[0].done_work, virtio_fs_notifications_done_work); + names[0] = fs->vqs[0].name; + + callbacks[1] = virtio_fs_vq_done; + snprintf(fs->vqs[1].name, sizeof(fs->vqs[1].name), "hiprio"); + names[1] = fs->vqs[1].name; + + /* Initialize the requests virtqueues */ + for (i = 2; i < fs->nvqs; i++) { + INIT_WORK(&fs->vqs[i].done_work, virtio_fs_requests_done_work); + snprintf(fs->vqs[i].name, sizeof(fs->vqs[i].name), + "requests.%u", i - 2); + callbacks[i] = virtio_fs_vq_done; + names[i] = fs->vqs[i].name; + } + + ret = virtio_find_vqs(vdev, fs->nvqs, vqs, callbacks, names, NULL); + if (ret < 0) + goto out; + + for (i = 0; i < fs->nvqs; i++) + fs->vqs[i].vq = vqs[i]; + +out: + kfree(names); + kfree(callbacks); + kfree(vqs); + return ret; +} + +/* Free virtqueues (device must already be reset) */ +static void virtio_fs_cleanup_vqs(struct virtio_device *vdev, + struct virtio_fs *fs) +{ + vdev->config->del_vqs(vdev); +} + static int virtio_fs_probe(struct virtio_device *vdev) { struct virtio_fs *fs; @@ -119,23 +343,32 @@ static int virtio_fs_probe(struct virtio_device *vdev) return -ENOMEM; vdev->priv = fs; - virtio_cread(vdev, struct virtio_fs_config, num_queues, - &fs->num_queues); - if (fs->num_queues == 0) { - ret = -EINVAL; + ret = virtio_fs_read_tag(vdev, fs); + if (ret < 0) goto out; - } - ret = virtio_fs_read_tag(vdev, fs); + ret = virtio_fs_setup_vqs(vdev, fs); if (ret < 0) goto out; + /* TODO vq affinity */ + /* TODO populate notifications vq */ + + /* Bring the device online in case the filesystem is mounted and + * requests need to be sent before we return. + */ + virtio_device_ready(vdev); + ret = virtio_fs_add_instance(fs); if (ret < 0) - goto out; + goto out_vqs; return 0; +out_vqs: + vdev->config->reset(vdev); + virtio_fs_cleanup_vqs(vdev, fs); + out: vdev->priv = NULL; return ret; @@ -148,6 +381,7 @@ static void virtio_fs_remove(struct virtio_device *vdev) virtio_fs_free_devs(fs); vdev->config->reset(vdev); + virtio_fs_cleanup_vqs(vdev, fs); mutex_lock(&virtio_fs_mutex); list_del(&fs->list); @@ -190,6 +424,234 @@ static struct virtio_driver virtio_fs_driver = { #endif }; +static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) +{ + /* TODO */ + spin_unlock(&fiq->waitq.lock); +} + +static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) +{ + /* TODO */ + spin_unlock(&fiq->waitq.lock); +} + +/* Return the number of scatter-gather list elements required */ +static unsigned sg_count_fuse_req(struct fuse_req *req) +{ + unsigned total_sgs = 1 /* fuse_in_header */; + + if (req->in.numargs - req->in.argpages) + total_sgs += 1; + + if (req->in.argpages) + total_sgs += req->num_pages; + + if (!test_bit(FR_ISREPLY, &req->flags)) + return total_sgs; + + total_sgs += 1 /* fuse_out_header */; + + if (req->out.numargs - req->out.argpages) + total_sgs += 1; + + if (req->out.argpages) + total_sgs += req->num_pages; + + return total_sgs; +} + +/* Add pages to scatter-gather list and return number of elements used */ +static unsigned sg_init_fuse_pages(struct scatterlist *sg, + struct page **pages, + struct fuse_page_desc *page_descs, + unsigned num_pages) +{ + unsigned i; + + for (i = 0; i < num_pages; i++) { + sg_init_table(&sg[i], 1); + sg_set_page(&sg[i], pages[i], + page_descs[i].length, + page_descs[i].offset); + } + + return i; +} + +/* Add args to scatter-gather list and return number of elements used */ +static unsigned sg_init_fuse_args(struct scatterlist *sg, + struct fuse_req *req, + struct fuse_arg *args, + unsigned numargs, + bool argpages, + void *argbuf, + unsigned *len_used) +{ + unsigned total_sgs = 0; + unsigned len; + + len = fuse_len_args(numargs - argpages, args); + if (len) + sg_init_one(&sg[total_sgs++], argbuf, len); + + if (argpages) + total_sgs += sg_init_fuse_pages(&sg[total_sgs], + req->pages, + req->page_descs, + req->num_pages); + + if (len_used) + *len_used = len; + + return total_sgs; +} + +/* Add a request to a virtqueue and kick the device */ +static int virtio_fs_enqueue_req(struct virtqueue *vq, struct fuse_req *req) +{ + struct scatterlist *stack_sgs[6 /* requests need at least 4 elements */]; + struct scatterlist stack_sg[ARRAY_SIZE(stack_sgs)]; + struct scatterlist **sgs = stack_sgs; + struct scatterlist *sg = stack_sg; + struct fuse_pqueue *fpq; + unsigned argbuf_used = 0; + unsigned out_sgs = 0; + unsigned in_sgs = 0; + unsigned total_sgs; + unsigned i; + int ret; + bool notify; + + /* Does the sglist fit on the stack? */ + total_sgs = sg_count_fuse_req(req); + if (total_sgs > ARRAY_SIZE(stack_sgs)) { + sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC); + sg = kmalloc_array(total_sgs, sizeof(sg[0]), GFP_ATOMIC); + if (!sgs || !sg) { + ret = -ENOMEM; + goto out; + } + } + + /* Use a bounce buffer since stack args cannot be mapped */ + ret = copy_args_to_argbuf(req); + if (ret < 0) + goto out; + + /* Request elements */ + sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h)); + out_sgs += sg_init_fuse_args(&sg[out_sgs], req, + (struct fuse_arg *)req->in.args, + req->in.numargs, req->in.argpages, + req->argbuf, &argbuf_used); + + /* Reply elements */ + if (test_bit(FR_ISREPLY, &req->flags)) { + sg_init_one(&sg[out_sgs + in_sgs++], + &req->out.h, sizeof(req->out.h)); + in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req, + req->out.args, req->out.numargs, + req->out.argpages, + req->argbuf + argbuf_used, NULL); + } + + BUG_ON(out_sgs + in_sgs != total_sgs); + + for (i = 0; i < total_sgs; i++) + sgs[i] = &sg[i]; + + fpq = vq_to_fpq(vq); + spin_lock(&fpq->lock); + + ret = virtqueue_add_sgs(vq, sgs, out_sgs, in_sgs, req, GFP_ATOMIC); + if (ret < 0) { + /* TODO handle full virtqueue */ + spin_unlock(&fpq->lock); + goto out; + } + + notify = virtqueue_kick_prepare(vq); + + spin_unlock(&fpq->lock); + + if (notify) + virtqueue_notify(vq); + +out: + if (ret < 0 && req->argbuf) { + kfree(req->argbuf); + req->argbuf = NULL; + } + if (sgs != stack_sgs) { + kfree(sgs); + kfree(sg); + } + + return ret; +} + +static void virtio_fs_wake_pending_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) +{ + unsigned queue_id = 2; /* TODO multiqueue */ + struct virtio_fs *fs; + struct fuse_conn *fc; + struct fuse_req *req; + struct fuse_pqueue *fpq; + int ret; + + BUG_ON(list_empty(&fiq->pending)); + req = list_last_entry(&fiq->pending, struct fuse_req, list); + clear_bit(FR_PENDING, &req->flags); + list_del_init(&req->list); + BUG_ON(!list_empty(&fiq->pending)); + spin_unlock(&fiq->waitq.lock); + + fs = fiq->priv; + fc = fs->vqs[queue_id].fud->fc; + + dev_dbg(&fs->vqs[queue_id].vq->vdev->dev, + "%s: opcode %u unique %#llx nodeid %#llx in.len %u out.len %u\n", + __func__, req->in.h.opcode, req->in.h.unique, req->in.h.nodeid, + req->in.h.len, fuse_len_args(req->out.numargs, req->out.args)); + + /* TODO put request onto fpq->io list? */ + + fpq = &fs->vqs[queue_id].fud->pq; + spin_lock(&fpq->lock); + if (!fpq->connected) { + spin_unlock(&fpq->lock); + req->out.h.error = -ENODEV; + printk(KERN_ERR "%s: disconnected\n", __func__); +/* fuse_request_end(fc, req); unsafe due to fc->lock */ + return; + } + list_add_tail(&req->list, fpq->processing); + spin_unlock(&fpq->lock); + set_bit(FR_SENT, &req->flags); + /* matches barrier in request_wait_answer() */ + smp_mb__after_atomic(); + /* TODO check for FR_INTERRUPTED? */ + + ret = virtio_fs_enqueue_req(fs->vqs[queue_id].vq, req); + if (ret < 0) { + req->out.h.error = ret; + printk(KERN_ERR "%s: virtio_fs_enqueue_req failed %d\n", + __func__, ret); +/* fuse_request_end(fc, req); unsafe due to fc->lock */ + return; + } +} + +const static struct fuse_iqueue_ops virtio_fs_fiq_ops = { + .wake_forget_and_unlock = virtio_fs_wake_forget_and_unlock, + .wake_interrupt_and_unlock = virtio_fs_wake_interrupt_and_unlock, + .wake_pending_and_unlock = virtio_fs_wake_pending_and_unlock, +}; + static int virtio_fs_fill_super(struct super_block *sb, void *data, int silent) { @@ -220,30 +682,35 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, } /* TODO lock */ - if (fs->fud) { + if (fs->vqs[2].fud) { printk(KERN_ERR "virtio-fs: device already in use\n"); err = -EBUSY; goto err; } - fs->fud = kcalloc(fs->num_queues, sizeof(fs->fud[0]), GFP_KERNEL); - if (!fs->fud) { - err = -ENOMEM; - goto err_fud; - } - err = fuse_fill_super_common(sb, &d, (void **)&fs->fud[0]); + /* TODO this sends FUSE_INIT and could cause hiprio or notifications + * virtqueue races since they haven't been set up yet! + */ + err = fuse_fill_super_common(sb, &d, &virtio_fs_fiq_ops, fs, + (void **)&fs->vqs[2].fud); if (err < 0) goto err_fud; - fc = fs->fud[0]->fc; + fc = fs->vqs[2].fud->fc; - /* Allocate remaining fuse_devs */ err = -ENOMEM; /* TODO take fuse_mutex around this loop? */ - for (i = 1; i < fs->num_queues; i++) { - fs->fud[i] = fuse_dev_alloc(fc); - if (!fs->fud[i]) { + for (i = 0; i < fs->nvqs; i++) { + struct virtio_fs_vq *fsvq = &fs->vqs[i]; + + if (i == 2) + continue; /* already initialized */ + + fsvq->fud = fuse_dev_alloc(fc); + if (!fsvq->fud) { /* TODO */ + printk(KERN_ERR "%s: fuse_dev_alloc failed\n", + __func__); } atomic_inc(&fc->dev_count); } From patchwork Mon Dec 10 17:12:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 809A418E8 for ; Mon, 10 Dec 2018 17:21:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64F452AF77 for ; Mon, 10 Dec 2018 17:21:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59BB22AF9E; Mon, 10 Dec 2018 17:21:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04D442AF92 for ; Mon, 10 Dec 2018 17:21:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728452AbeLJRUw (ORCPT ); Mon, 10 Dec 2018 12:20:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47750 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728344AbeLJRNe (ORCPT ); Mon, 10 Dec 2018 12:13:34 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0C06C30A7696; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id CF6481054FD2; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 2B7B2223C07; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 10/52] fuse: export fuse_get_unique() Date: Mon, 10 Dec 2018 12:12:36 -0500 Message-Id: <20181210171318.16998-11-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi virtio-fs will need unique IDs for FORGET requests from outside fs/fuse/dev.c. Make the symbol visible. Signed-off-by: Stefan Hajnoczi --- fs/fuse/dev.c | 3 ++- fs/fuse/fuse_i.h | 5 +++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index b26ee5ed8974..f35c4ab2dcbb 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -360,11 +360,12 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args) } EXPORT_SYMBOL_GPL(fuse_len_args); -static u64 fuse_get_unique(struct fuse_iqueue *fiq) +u64 fuse_get_unique(struct fuse_iqueue *fiq) { fiq->reqctr += FUSE_REQ_ID_STEP; return fiq->reqctr; } +EXPORT_SYMBOL_GPL(fuse_get_unique); static unsigned int fuse_req_hash(u64 unique) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 3a91aa970566..f463586f2c9e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1172,4 +1172,9 @@ int fuse_readdir(struct file *file, struct dir_context *ctx); */ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); +/** + * Get the next unique ID for a request + */ +u64 fuse_get_unique(struct fuse_iqueue *fiq); + #endif /* _FS_FUSE_I_H */ From patchwork Mon Dec 10 17:12:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721961 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 855D715A6 for ; Mon, 10 Dec 2018 17:20:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 49FC42AF66 for ; Mon, 10 Dec 2018 17:20:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3EAC72AF83; Mon, 10 Dec 2018 17:20:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 184B02AF45 for ; Mon, 10 Dec 2018 17:20:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729150AbeLJRT4 (ORCPT ); Mon, 10 Dec 2018 12:19:56 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36966 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728361AbeLJRNe (ORCPT ); Mon, 10 Dec 2018 12:13:34 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 268763154861; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id E178F600D7; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 2FEB8223C0D; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 11/52] fuse: implement FUSE_FORGET for virtio-fs Date: Mon, 10 Dec 2018 12:12:37 -0500 Message-Id: <20181210171318.16998-12-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Sent single FUSE_FORGET requests on the hiprio queue. In the future it may be possible to do FUSE_BATCH_FORGET but that is tricky since virtio-fs gets called synchronously when forgets are queued. Signed-off-by: Stefan Hajnoczi --- fs/fuse/virtio_fs.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 89 insertions(+), 4 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index fa99a31ee930..225eb729656f 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -140,10 +140,26 @@ static void virtio_fs_notifications_done_work(struct work_struct *work) return; } -static void virtio_fs_hiprio_done(struct virtqueue *vq) +/* Work function for hiprio completion */ +static void virtio_fs_hiprio_done_work(struct work_struct *work) { - /* TODO */ - dev_dbg(&vq->vdev->dev, "%s\n", __func__); + struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, + done_work); + struct fuse_pqueue *fpq = &fsvq->fud->pq; + struct virtqueue *vq = fsvq->vq; + + /* Free completed FUSE_FORGET requests */ + spin_lock(&fpq->lock); + do { + unsigned len; + void *req; + + virtqueue_disable_cb(vq); + + while ((req = virtqueue_get_buf(vq, &len)) != NULL) + kfree(req); + } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); + spin_unlock(&fpq->lock); } /* Allocate and copy args into req->argbuf */ @@ -302,6 +318,7 @@ static int virtio_fs_setup_vqs(struct virtio_device *vdev, callbacks[1] = virtio_fs_vq_done; snprintf(fs->vqs[1].name, sizeof(fs->vqs[1].name), "hiprio"); names[1] = fs->vqs[1].name; + INIT_WORK(&fs->vqs[1].done_work, virtio_fs_hiprio_done_work); /* Initialize the requests virtqueues */ for (i = 2; i < fs->nvqs; i++) { @@ -424,11 +441,79 @@ static struct virtio_driver virtio_fs_driver = { #endif }; +struct virtio_fs_forget { + struct fuse_in_header ih; + struct fuse_forget_in arg; +}; + static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq) __releases(fiq->waitq.lock) { - /* TODO */ + struct fuse_forget_link *link; + struct virtio_fs_forget *forget; + struct fuse_pqueue *fpq; + struct scatterlist sg; + struct scatterlist *sgs[] = {&sg}; + struct virtio_fs *fs; + struct virtqueue *vq; + bool notify; + u64 unique; + int ret; + + BUG_ON(!fiq->forget_list_head.next); + link = fiq->forget_list_head.next; + BUG_ON(link->next); + fiq->forget_list_head.next = NULL; + fiq->forget_list_tail = &fiq->forget_list_head; + + unique = fuse_get_unique(fiq); + + fs = fiq->priv; + spin_unlock(&fiq->waitq.lock); + + /* Allocate a buffer for the request */ + forget = kmalloc(sizeof(*forget), GFP_ATOMIC); + if (!forget) { + pr_err("virtio-fs: dropped FORGET: kmalloc failed\n"); + goto out; /* TODO avoid dropping it? */ + } + + forget->ih = (struct fuse_in_header){ + .opcode = FUSE_FORGET, + .nodeid = link->forget_one.nodeid, + .unique = unique, + .len = sizeof(*forget), + }; + forget->arg = (struct fuse_forget_in){ + .nlookup = link->forget_one.nlookup, + }; + + sg_init_one(&sg, forget, sizeof(*forget)); + + /* Enqueue the request */ + vq = fs->vqs[1].vq; + dev_dbg(&vq->vdev->dev, "%s\n", __func__); + fpq = vq_to_fpq(vq); + spin_lock(&fpq->lock); + + ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC); + if (ret < 0) { + pr_err("virtio-fs: dropped FORGET: queue full\n"); + /* TODO handle full virtqueue */ + spin_unlock(&fpq->lock); + goto out; + } + + notify = virtqueue_kick_prepare(vq); + + spin_unlock(&fpq->lock); + + if (notify) + virtqueue_notify(vq); + +out: + kfree(link); } static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq) From patchwork Mon Dec 10 17:12:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF50114E2 for ; Mon, 10 Dec 2018 17:20:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A149C2AF6A for ; Mon, 10 Dec 2018 17:20:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 955532AF85; Mon, 10 Dec 2018 17:20:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 257022AF77 for ; Mon, 10 Dec 2018 17:20:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729041AbeLJRUc (ORCPT ); Mon, 10 Dec 2018 12:20:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:61683 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728354AbeLJRNe (ORCPT ); Mon, 10 Dec 2018 12:13:34 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 374233091791; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id E4E0C6012B; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 3316D223C0F; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 12/52] virtio_fs: Set up dax_device Date: Mon, 10 Dec 2018 12:12:38 -0500 Message-Id: <20181210171318.16998-13-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Setup a dax device. Signed-off-by: Stefan Hajnoczi --- fs/fuse/virtio_fs.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 225eb729656f..fd914f2c6209 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -5,6 +5,8 @@ */ #include +#include +#include #include #include #include @@ -29,6 +31,11 @@ struct virtio_fs { struct virtio_fs_vq *vqs; unsigned nvqs; /* number of virtqueues */ unsigned num_queues; /* number of request queues */ + struct dax_device *dax_dev; + + /* DAX memory window where file contents are mapped */ + void *window_kaddr; + phys_addr_t window_phys_addr; }; static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq) @@ -350,6 +357,44 @@ static void virtio_fs_cleanup_vqs(struct virtio_device *vdev, vdev->config->del_vqs(vdev); } +/* Map a window offset to a page frame number. The window offset will have + * been produced by .iomap_begin(), which maps a file offset to a window + * offset. + */ +static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, + long nr_pages, void **kaddr, pfn_t *pfn) +{ + struct virtio_fs *fs = dax_get_private(dax_dev); + phys_addr_t offset = PFN_PHYS(pgoff); + + if (kaddr) + *kaddr = fs->window_kaddr + offset; + if (pfn) + *pfn = phys_to_pfn_t(fs->window_phys_addr + offset, + PFN_DEV | PFN_MAP); + return nr_pages; +} + +static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev, + pgoff_t pgoff, void *addr, + size_t bytes, struct iov_iter *i) +{ + return copy_from_iter(addr, bytes, i); +} + +static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev, + pgoff_t pgoff, void *addr, + size_t bytes, struct iov_iter *i) +{ + return copy_to_iter(addr, bytes, i); +} + +static const struct dax_operations virtio_fs_dax_ops = { + .direct_access = virtio_fs_direct_access, + .copy_from_iter = virtio_fs_copy_from_iter, + .copy_to_iter = virtio_fs_copy_to_iter, +}; + static int virtio_fs_probe(struct virtio_device *vdev) { struct virtio_fs *fs; @@ -371,6 +416,17 @@ static int virtio_fs_probe(struct virtio_device *vdev) /* TODO vq affinity */ /* TODO populate notifications vq */ + if (IS_ENABLED(CONFIG_DAX_DRIVER)) { + /* TODO map window */ + fs->window_kaddr = NULL; + fs->window_phys_addr = 0; + + fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops); + if (!fs->dax_dev) + goto out_vqs; /* TODO handle case where device doesn't expose + BAR */ + } + /* Bring the device online in case the filesystem is mounted and * requests need to be sent before we return. */ @@ -386,6 +442,12 @@ static int virtio_fs_probe(struct virtio_device *vdev) vdev->config->reset(vdev); virtio_fs_cleanup_vqs(vdev, fs); + if (fs->dax_dev) { + kill_dax(fs->dax_dev); + put_dax(fs->dax_dev); + fs->dax_dev = NULL; + } + out: vdev->priv = NULL; return ret; @@ -404,6 +466,12 @@ static void virtio_fs_remove(struct virtio_device *vdev) list_del(&fs->list); mutex_unlock(&virtio_fs_mutex); + if (fs->dax_dev) { + kill_dax(fs->dax_dev); + put_dax(fs->dax_dev); + fs->dax_dev = NULL; + } + vdev->priv = NULL; } From patchwork Mon Dec 10 17:12:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721967 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2EF7E18E8 for ; Mon, 10 Dec 2018 17:20:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1031A2AA2B for ; Mon, 10 Dec 2018 17:20:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 045082AF60; Mon, 10 Dec 2018 17:20:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF3C22AA2B for ; Mon, 10 Dec 2018 17:20:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729158AbeLJRT4 (ORCPT ); Mon, 10 Dec 2018 12:19:56 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58414 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728358AbeLJRNe (ORCPT ); Mon, 10 Dec 2018 12:13:34 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2C4C930014CC; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id E608A1001914; Mon, 10 Dec 2018 17:13:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 3708D223C11; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 13/52] dax: remove block device dependencies Date: Mon, 10 Dec 2018 12:12:39 -0500 Message-Id: <20181210171318.16998-14-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Although struct dax_device itself is not tied to a block device, some DAX code assumes there is a block device. Make block devices optional by allowing bdev to be NULL in commonly used DAX APIs. When there is no block device: * Skip the partition offset calculation in bdev_dax_pgoff() * Skip the blkdev_issue_zeroout() optimization Note that more block device assumptions remain but I haven't reach those code paths yet. Signed-off-by: Stefan Hajnoczi --- drivers/dax/super.c | 3 ++- fs/dax.c | 7 ++++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 6e928f37d084..74f3bf7ae822 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -52,7 +52,8 @@ EXPORT_SYMBOL_GPL(dax_read_unlock); int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size, pgoff_t *pgoff) { - phys_addr_t phys_off = (get_start_sect(bdev) + sector) * 512; + sector_t start_sect = bdev ? get_start_sect(bdev) : 0; + phys_addr_t phys_off = (start_sect + sector) * 512; if (pgoff) *pgoff = PHYS_PFN(phys_off); diff --git a/fs/dax.c b/fs/dax.c index 9bcce89ea18e..6431c3aba182 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1021,7 +1021,12 @@ static vm_fault_t dax_load_hole(struct xa_state *xas, static bool dax_range_is_aligned(struct block_device *bdev, unsigned int offset, unsigned int length) { - unsigned short sector_size = bdev_logical_block_size(bdev); + unsigned short sector_size; + + if (!bdev) + return false; + + sector_size = bdev_logical_block_size(bdev); if (!IS_ALIGNED(offset, sector_size)) return false; From patchwork Mon Dec 10 17:12:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721865 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 957FD14E2 for ; Mon, 10 Dec 2018 17:16:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 77E632AF3F for ; Mon, 10 Dec 2018 17:16:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A1572AF40; Mon, 10 Dec 2018 17:16:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDD992AF37 for ; Mon, 10 Dec 2018 17:16:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726649AbeLJRQ3 (ORCPT ); Mon, 10 Dec 2018 12:16:29 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34806 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728389AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 77F023002C74; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2321D5D75F; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 3A81B223C12; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 14/52] fuse: add fuse_conn->dax_dev field Date: Mon, 10 Dec 2018 12:12:40 -0500 Message-Id: <20181210171318.16998-15-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi A struct dax_device instance is a prerequisite for the DAX filesystem APIs. Let virtio_fs associate a dax_device with a fuse_conn. Classic FUSE and CUSE set the pointer to NULL, disabling DAX. Signed-off-by: Stefan Hajnoczi --- fs/fuse/cuse.c | 3 ++- fs/fuse/fuse_i.h | 8 +++++++- fs/fuse/inode.c | 9 ++++++--- fs/fuse/virtio_fs.c | 3 ++- 4 files changed, 17 insertions(+), 6 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 98dc780cbafa..bf8c1c470e8c 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -503,7 +503,8 @@ static int cuse_channel_open(struct inode *inode, struct file *file) * Limit the cuse channel to requests that can * be represented in file->f_cred->user_ns. */ - fuse_conn_init(&cc->fc, file->f_cred->user_ns, &fuse_dev_fiq_ops, NULL); + fuse_conn_init(&cc->fc, file->f_cred->user_ns, NULL, &fuse_dev_fiq_ops, + NULL); fud = fuse_dev_alloc(&cc->fc); if (!fud) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f463586f2c9e..b5a6a12e67d6 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -803,6 +803,9 @@ struct fuse_conn { /** List of device instances belonging to this connection */ struct list_head devices; + + /** DAX device, non-NULL if DAX is supported */ + struct dax_device *dax_dev; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) @@ -1025,7 +1028,8 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc); * Initialize fuse_conn */ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, - const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv); + struct dax_device *dax_dev, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv); /** * Release reference to fuse_conn @@ -1045,12 +1049,14 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, * Fill in superblock and initialize fuse connection * @sb: partially-initialized superblock to fill in * @mount_data: mount parameters + * @dax_dev: DAX device, may be NULL * @fiq_ops: fuse input queue operations * @fiq_priv: device-specific state for fuse_iqueue * @fudptr: fuse_dev pointer to fill in, should contain NULL on entry */ int fuse_fill_super_common(struct super_block *sb, struct fuse_mount_data *mount_data, + struct dax_device *dax_dev, const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv, void **fudptr); diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 31bb817575c4..10e4a39318c4 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -600,7 +600,8 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq) } void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, - const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) + struct dax_device *dax_dev, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) { memset(fc, 0, sizeof(*fc)); spin_lock_init(&fc->lock); @@ -625,6 +626,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->attr_version = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); + fc->dax_dev = dax_dev; fc->user_ns = get_user_ns(user_ns); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -1072,6 +1074,7 @@ EXPORT_SYMBOL_GPL(fuse_dev_free); int fuse_fill_super_common(struct super_block *sb, struct fuse_mount_data *mount_data, + struct dax_device *dax_dev, const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv, void **fudptr) @@ -1122,7 +1125,7 @@ int fuse_fill_super_common(struct super_block *sb, if (!fc) goto err; - fuse_conn_init(fc, sb->s_user_ns, fiq_ops, fiq_priv); + fuse_conn_init(fc, sb->s_user_ns, dax_dev, fiq_ops, fiq_priv); fc->release = fuse_free_conn; fud = fuse_dev_alloc(fc); @@ -1233,7 +1236,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) (file->f_cred->user_ns != sb->s_user_ns)) goto err_fput; - err = fuse_fill_super_common(sb, &d, &fuse_dev_fiq_ops, NULL, + err = fuse_fill_super_common(sb, &d, NULL, &fuse_dev_fiq_ops, NULL, &file->private_data); err_fput: fput(file); diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index fd914f2c6209..ba615ec2603e 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -844,7 +844,8 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, /* TODO this sends FUSE_INIT and could cause hiprio or notifications * virtqueue races since they haven't been set up yet! */ - err = fuse_fill_super_common(sb, &d, &virtio_fs_fiq_ops, fs, + err = fuse_fill_super_common(sb, &d, fs->dax_dev, + &virtio_fs_fiq_ops, fs, (void **)&fs->vqs[2].fud); if (err < 0) goto err_fud; From patchwork Mon Dec 10 17:12:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721925 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D57D15A6 for ; Mon, 10 Dec 2018 17:18:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04C0C29C52 for ; Mon, 10 Dec 2018 17:18:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EBD7229D12; Mon, 10 Dec 2018 17:18:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E425F2AF45 for ; Mon, 10 Dec 2018 17:18:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729070AbeLJRSg (ORCPT ); Mon, 10 Dec 2018 12:18:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47768 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728367AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 892D830043EC; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4972B1057041; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 40AE0223C18; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 15/52] fuse: map virtio_fs DAX window BAR Date: Mon, 10 Dec 2018 12:12:41 -0500 Message-Id: <20181210171318.16998-16-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Experimental QEMU code introduces an MMIO BAR for mapping portions of files in the virtio-fs device. Map this BAR so that FUSE DAX can access file contents from the host page cache. The DAX window is accessed by the fs/dax.c infrastructure and must have struct pages (at least on x86). Use devm_memremap_pages() to map the DAX window PCI BAR and allocate struct page. Signed-off-by: Stefan Hajnoczi --- fs/fuse/virtio_fs.c | 166 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 143 insertions(+), 23 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index ba615ec2603e..87b7e42a6763 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -6,12 +6,18 @@ #include #include +#include #include #include #include #include #include "fuse_i.h" +enum { + /* PCI BAR number of the virtio-fs DAX window */ + VIRTIO_FS_WINDOW_BAR = 2, +}; + /* List of virtio-fs device instances and a lock for the list */ static DEFINE_MUTEX(virtio_fs_mutex); static LIST_HEAD(virtio_fs_instances); @@ -24,6 +30,18 @@ struct virtio_fs_vq { char name[24]; } ____cacheline_aligned_in_smp; +/* State needed for devm_memremap_pages(). This API is called on the + * underlying pci_dev instead of struct virtio_fs (layering violation). Since + * the memremap release function only gets called when the pci_dev is released, + * keep the associated state separate from struct virtio_fs (it has a different + * lifecycle from pci_dev). + */ +struct virtio_fs_memremap_info { + struct dev_pagemap pgmap; + struct percpu_ref ref; + struct completion completion; +}; + /* A virtio-fs device instance */ struct virtio_fs { struct list_head list; /* on virtio_fs_instances */ @@ -36,6 +54,7 @@ struct virtio_fs { /* DAX memory window where file contents are mapped */ void *window_kaddr; phys_addr_t window_phys_addr; + size_t window_len; }; static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq) @@ -395,6 +414,127 @@ static const struct dax_operations virtio_fs_dax_ops = { .copy_to_iter = virtio_fs_copy_to_iter, }; +static void virtio_fs_percpu_release(struct percpu_ref *ref) +{ + struct virtio_fs_memremap_info *mi = + container_of(ref, struct virtio_fs_memremap_info, ref); + + complete(&mi->completion); +} + +static void virtio_fs_percpu_exit(void *data) +{ + struct virtio_fs_memremap_info *mi = data; + + wait_for_completion(&mi->completion); + percpu_ref_exit(&mi->ref); +} + +static void virtio_fs_percpu_kill(void *data) +{ + percpu_ref_kill(data); +} + +static void virtio_fs_cleanup_dax(void *data) +{ + struct virtio_fs *fs = data; + + kill_dax(fs->dax_dev); + put_dax(fs->dax_dev); +} + +static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) +{ + struct virtio_fs_memremap_info *mi; + struct dev_pagemap *pgmap; + struct pci_dev *pci_dev; + phys_addr_t phys_addr; + size_t len; + int ret; + + if (!IS_ENABLED(CONFIG_DAX_DRIVER)) + return 0; + + /* HACK implement VIRTIO shared memory regions instead of + * directly accessing the PCI BAR from a virtio device driver. + */ + pci_dev = container_of(vdev->dev.parent, struct pci_dev, dev); + + /* TODO Is this safe - the virtio_pci_* driver doesn't use managed + * device APIs? */ + ret = pcim_enable_device(pci_dev); + if (ret < 0) + return ret; + + /* TODO handle case where device doesn't expose BAR? */ + ret = pci_request_region(pci_dev, VIRTIO_FS_WINDOW_BAR, + "virtio-fs-window"); + if (ret < 0) { + dev_err(&vdev->dev, "%s: failed to request window BAR\n", + __func__); + return ret; + } + + phys_addr = pci_resource_start(pci_dev, VIRTIO_FS_WINDOW_BAR); + len = pci_resource_len(pci_dev, VIRTIO_FS_WINDOW_BAR); + + mi = devm_kzalloc(&pci_dev->dev, sizeof(*mi), GFP_KERNEL); + if (!mi) + return -ENOMEM; + + init_completion(&mi->completion); + ret = percpu_ref_init(&mi->ref, virtio_fs_percpu_release, 0, + GFP_KERNEL); + if (ret < 0) { + dev_err(&vdev->dev, "%s: percpu_ref_init failed (%d)\n", + __func__, ret); + return ret; + } + + ret = devm_add_action(&pci_dev->dev, virtio_fs_percpu_exit, mi); + if (ret < 0) { + percpu_ref_exit(&mi->ref); + return ret; + } + + pgmap = &mi->pgmap; + pgmap->altmap_valid = false; + pgmap->ref = &mi->ref; + pgmap->type = MEMORY_DEVICE_FS_DAX; + + /* Ideally we would directly use the PCI BAR resource but + * devm_memremap_pages() wants its own copy in pgmap. So + * initialize a struct resource from scratch (only the start + * and end fields will be used). + */ + pgmap->res = (struct resource){ + .name = "virtio-fs dax window", + .start = phys_addr, + .end = phys_addr + len, + }; + + fs->window_kaddr = devm_memremap_pages(&pci_dev->dev, pgmap); + if (IS_ERR(fs->window_kaddr)) + return PTR_ERR(fs->window_kaddr); + + ret = devm_add_action_or_reset(&pci_dev->dev, virtio_fs_percpu_kill, + &mi->ref); + if (ret < 0) + return ret; + + fs->window_phys_addr = phys_addr; + fs->window_len = len; + + dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len %zu\n", + __func__, fs->window_kaddr, phys_addr, len); + + fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops); + if (!fs->dax_dev) + return -ENOMEM; + + return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax, fs); +} + static int virtio_fs_probe(struct virtio_device *vdev) { struct virtio_fs *fs; @@ -416,16 +556,9 @@ static int virtio_fs_probe(struct virtio_device *vdev) /* TODO vq affinity */ /* TODO populate notifications vq */ - if (IS_ENABLED(CONFIG_DAX_DRIVER)) { - /* TODO map window */ - fs->window_kaddr = NULL; - fs->window_phys_addr = 0; - - fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops); - if (!fs->dax_dev) - goto out_vqs; /* TODO handle case where device doesn't expose - BAR */ - } + ret = virtio_fs_setup_dax(vdev, fs); + if (ret < 0) + goto out_vqs; /* Bring the device online in case the filesystem is mounted and * requests need to be sent before we return. @@ -441,13 +574,6 @@ static int virtio_fs_probe(struct virtio_device *vdev) out_vqs: vdev->config->reset(vdev); virtio_fs_cleanup_vqs(vdev, fs); - - if (fs->dax_dev) { - kill_dax(fs->dax_dev); - put_dax(fs->dax_dev); - fs->dax_dev = NULL; - } - out: vdev->priv = NULL; return ret; @@ -466,12 +592,6 @@ static void virtio_fs_remove(struct virtio_device *vdev) list_del(&fs->list); mutex_unlock(&virtio_fs_mutex); - if (fs->dax_dev) { - kill_dax(fs->dax_dev); - put_dax(fs->dax_dev); - fs->dax_dev = NULL; - } - vdev->priv = NULL; } From patchwork Mon Dec 10 17:12:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721825 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F00B14E2 for ; Mon, 10 Dec 2018 17:15:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E82EB2AF37 for ; Mon, 10 Dec 2018 17:15:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC4F12AF3F; Mon, 10 Dec 2018 17:15:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 767C92AF37 for ; Mon, 10 Dec 2018 17:15:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728495AbeLJRNj (ORCPT ); Mon, 10 Dec 2018 12:13:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40865 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728371AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 98A7B307D975; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 47DE01054FD3; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 44268223C1A; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 16/52] virtio-fs: Add VIRTIO_PCI_CAP_SHARED_MEMORY_CFG and utility to find them Date: Mon, 10 Dec 2018 12:12:42 -0500 Message-Id: <20181210171318.16998-17-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Dr. David Alan Gilbert" The shm cap defines a capability to allow 64bit size chunks at 64 bit size offsets into a bar. There can be multiple such chunks on any one device. Signed-off-by: Dr. David Alan Gilbert --- fs/fuse/virtio_fs.c | 69 +++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/virtio_pci.h | 10 ++++++ 2 files changed, 79 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 87b7e42a6763..cd916943205e 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "fuse_i.h" enum { @@ -57,6 +58,74 @@ struct virtio_fs { size_t window_len; }; +/* TODO: This should be in a PCI file somewhere */ +static int virtio_pci_find_shm_cap(struct pci_dev *dev, + u8 required_id, + u8 *bar, u64 *offset, u64 *len) +{ + int pos; + + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); + pos > 0; + pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) { + u8 type, cap_len, id; + u32 tmp32; + u64 res_offset, res_length; + + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + cfg_type), + &type); + if (type != VIRTIO_PCI_CAP_SHARED_MEMORY_CFG) + continue; + + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + cap_len), + &cap_len); + if (cap_len != sizeof(struct virtio_pci_shm_cap)) { + printk(KERN_ERR "%s: shm cap with bad size offset: %d size: %d\n", + __func__, pos, cap_len); + continue; + }; + + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_shm_cap, + id), + &id); + if (id != required_id) + continue; + + /* Type, and ID match, looks good */ + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + bar), + bar); + + /* Read the lower 32bit of length and offset */ + pci_read_config_dword(dev, pos + offsetof(struct virtio_pci_cap, offset), + &tmp32); + res_offset = tmp32; + pci_read_config_dword(dev, pos + offsetof(struct virtio_pci_cap, length), + &tmp32); + res_length = tmp32; + + /* and now the top half */ + pci_read_config_dword(dev, + pos + offsetof(struct virtio_pci_shm_cap, + offset_hi), + &tmp32); + res_offset |= ((u64)tmp32) << 32; + pci_read_config_dword(dev, + pos + offsetof(struct virtio_pci_shm_cap, + length_hi), + &tmp32); + res_length |= ((u64)tmp32) << 32; + + *offset = res_offset; + *len = res_length; + + return pos; + } + return 0; +} + static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq) { struct virtio_fs *fs = vq->vdev->priv; diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index 90007a1abcab..2e6072b5a7c9 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -113,6 +113,8 @@ #define VIRTIO_PCI_CAP_DEVICE_CFG 4 /* PCI configuration access */ #define VIRTIO_PCI_CAP_PCI_CFG 5 +/* Additional shared memory capability */ +#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8 /* This is the PCI capability header: */ struct virtio_pci_cap { @@ -163,6 +165,14 @@ struct virtio_pci_cfg_cap { __u8 pci_cfg_data[4]; /* Data for BAR access. */ }; +/* Fields in VIRTIO_PCI_CAP_SHARED_MEMORY_CFG */ +struct virtio_pci_shm_cap { + struct virtio_pci_cap cap; + __le32 offset_hi; /* Most sig 32 bits of offset */ + __le32 length_hi; /* Most sig 32 bits of length */ + __u8 id; /* To distinguish shm chunks */ +}; + /* Macro versions of offsets for the Old Timers! */ #define VIRTIO_PCI_CAP_VNDR 0 #define VIRTIO_PCI_CAP_NEXT 1 From patchwork Mon Dec 10 17:12:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721829 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D8A615A6 for ; Mon, 10 Dec 2018 17:15:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 537AB2AF37 for ; Mon, 10 Dec 2018 17:15:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 47C172AF3F; Mon, 10 Dec 2018 17:15:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBA3C2AF37 for ; Mon, 10 Dec 2018 17:15:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728483AbeLJRNi (ORCPT ); Mon, 10 Dec 2018 12:13:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45090 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728370AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8A72B2D7F9; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 57995600D7; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 48A2F223C1D; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 17/52] virtio-fs: Retrieve shm capabilities for cache Date: Mon, 10 Dec 2018 12:12:43 -0500 Message-Id: <20181210171318.16998-18-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Retrieve the capabilities needed to find the cache. Signed-off-by: Dr. David Alan Gilbert --- fs/fuse/virtio_fs.c | 15 +++++++++++++++ include/uapi/linux/virtio_fs.h | 3 +++ 2 files changed, 18 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index cd916943205e..60d496c16841 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -520,6 +520,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) phys_addr_t phys_addr; size_t len; int ret; + u8 have_cache, cache_bar; + u64 cache_offset, cache_len; if (!IS_ENABLED(CONFIG_DAX_DRIVER)) return 0; @@ -535,6 +537,19 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) if (ret < 0) return ret; + have_cache = virtio_pci_find_shm_cap(pci_dev, + VIRTIO_FS_PCI_SHMCAP_ID_CACHE, &cache_bar, + &cache_offset, &cache_len); + + if (!have_cache) { + dev_err(&vdev->dev, "%s: No cache capability\n", + __func__); + return -ENXIO; + } else { + dev_notice(&vdev->dev, "Cache bar: %d len: 0x%llx @ 0x%llx\n", + cache_bar, cache_len, cache_offset); + } + /* TODO handle case where device doesn't expose BAR? */ ret = pci_request_region(pci_dev, VIRTIO_FS_WINDOW_BAR, "virtio-fs-window"); diff --git a/include/uapi/linux/virtio_fs.h b/include/uapi/linux/virtio_fs.h index 48f3590dcfbe..65a9d4a0dac0 100644 --- a/include/uapi/linux/virtio_fs.h +++ b/include/uapi/linux/virtio_fs.h @@ -38,4 +38,7 @@ struct virtio_fs_config { __u32 num_queues; } __attribute__((packed)); +/* For the id field in virtio_pci_shm_cap */ +#define VIRTIO_FS_PCI_SHMCAP_ID_CACHE 0 + #endif /* _UAPI_LINUX_VIRTIO_FS_H */ From patchwork Mon Dec 10 17:12:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721909 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71E5B14E2 for ; Mon, 10 Dec 2018 17:18:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 55F8B2AF3C for ; Mon, 10 Dec 2018 17:18:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4A1E92AF40; Mon, 10 Dec 2018 17:18:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD8042AF3C for ; Mon, 10 Dec 2018 17:18:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728981AbeLJRRv (ORCPT ); Mon, 10 Dec 2018 12:17:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46480 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728322AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8AC18307D96D; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5790E600D6; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 4BC33223C23; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 18/52] virtio-fs: Map cache using the values from the capabilities Date: Mon, 10 Dec 2018 12:12:44 -0500 Message-Id: <20181210171318.16998-19-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Instead of assuming we had the fixed bar for the cache, use the value from the capabilities. Signed-off-by: Dr. David Alan Gilbert --- fs/fuse/virtio_fs.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 60d496c16841..55bac1465536 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -14,11 +14,6 @@ #include #include "fuse_i.h" -enum { - /* PCI BAR number of the virtio-fs DAX window */ - VIRTIO_FS_WINDOW_BAR = 2, -}; - /* List of virtio-fs device instances and a lock for the list */ static DEFINE_MUTEX(virtio_fs_mutex); static LIST_HEAD(virtio_fs_instances); @@ -518,7 +513,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) struct dev_pagemap *pgmap; struct pci_dev *pci_dev; phys_addr_t phys_addr; - size_t len; + size_t bar_len; int ret; u8 have_cache, cache_bar; u64 cache_offset, cache_len; @@ -551,17 +546,13 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) } /* TODO handle case where device doesn't expose BAR? */ - ret = pci_request_region(pci_dev, VIRTIO_FS_WINDOW_BAR, - "virtio-fs-window"); + ret = pci_request_region(pci_dev, cache_bar, "virtio-fs-window"); if (ret < 0) { dev_err(&vdev->dev, "%s: failed to request window BAR\n", __func__); return ret; } - phys_addr = pci_resource_start(pci_dev, VIRTIO_FS_WINDOW_BAR); - len = pci_resource_len(pci_dev, VIRTIO_FS_WINDOW_BAR); - mi = devm_kzalloc(&pci_dev->dev, sizeof(*mi), GFP_KERNEL); if (!mi) return -ENOMEM; @@ -586,6 +577,17 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) pgmap->ref = &mi->ref; pgmap->type = MEMORY_DEVICE_FS_DAX; + phys_addr = pci_resource_start(pci_dev, cache_bar); + bar_len = pci_resource_len(pci_dev, cache_bar); + + if (cache_offset + cache_len > bar_len) { + dev_err(&vdev->dev, + "%s: cache bar shorter than cap offset+len\n", + __func__); + return -EINVAL; + } + phys_addr += cache_offset; + /* Ideally we would directly use the PCI BAR resource but * devm_memremap_pages() wants its own copy in pgmap. So * initialize a struct resource from scratch (only the start @@ -594,7 +596,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) pgmap->res = (struct resource){ .name = "virtio-fs dax window", .start = phys_addr, - .end = phys_addr + len, + .end = phys_addr + cache_len, }; fs->window_kaddr = devm_memremap_pages(&pci_dev->dev, pgmap); @@ -607,10 +609,10 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) return ret; fs->window_phys_addr = phys_addr; - fs->window_len = len; + fs->window_len = cache_len; - dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len %zu\n", - __func__, fs->window_kaddr, phys_addr, len); + dev_dbg(&vdev->dev, "%s: cache kaddr 0x%px phys_addr 0x%llx len %llx\n", + __func__, fs->window_kaddr, phys_addr, cache_len); fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops); if (!fs->dax_dev) From patchwork Mon Dec 10 17:12:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721935 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 417DD18E8 for ; Mon, 10 Dec 2018 17:19:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2650F2A5AF for ; Mon, 10 Dec 2018 17:19:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1A5CB2A60A; Mon, 10 Dec 2018 17:19:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2B032A5B6 for ; Mon, 10 Dec 2018 17:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729034AbeLJRSf (ORCPT ); Mon, 10 Dec 2018 12:18:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38422 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728372AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 92710C049593; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5D07E6012B; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 4F1EB224247; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 19/52] virito-fs: Make dax optional Date: Mon, 10 Dec 2018 12:12:45 -0500 Message-Id: <20181210171318.16998-20-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Dr. David Alan Gilbert" Add a 'dax' option and only enable dax when it's on. Also show "dax" in mount options if filesystem was mounted with dax enabled. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal --- fs/fuse/fuse_i.h | 1 + fs/fuse/inode.c | 8 ++++++++ fs/fuse/virtio_fs.c | 2 +- 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index b5a6a12e67d6..345abe9b022f 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -70,6 +70,7 @@ struct fuse_mount_data { unsigned group_id_present:1; unsigned default_permissions:1; unsigned allow_other:1; + unsigned dax:1; unsigned max_read; unsigned blksize; }; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 10e4a39318c4..d2afce377fd4 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -439,6 +439,7 @@ enum { OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, + OPT_DAX, OPT_ERR }; @@ -452,6 +453,7 @@ static const match_table_t tokens = { {OPT_ALLOW_OTHER, "allow_other"}, {OPT_MAX_READ, "max_read=%u"}, {OPT_BLKSIZE, "blksize=%u"}, + {OPT_DAX, "dax"}, {OPT_ERR, NULL} }; @@ -543,6 +545,10 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, d->blksize = value; break; + case OPT_DAX: + d->dax = 1; + break; + default: return 0; } @@ -571,6 +577,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root) seq_printf(m, ",max_read=%u", fc->max_read); if (sb->s_bdev && sb->s_blocksize != FUSE_DEFAULT_BLKSIZE) seq_printf(m, ",blksize=%lu", sb->s_blocksize); + if (fc->dax_dev) + seq_printf(m, ",dax"); return 0; } diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 55bac1465536..e4d5e0cd41ba 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1050,7 +1050,7 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, /* TODO this sends FUSE_INIT and could cause hiprio or notifications * virtqueue races since they haven't been set up yet! */ - err = fuse_fill_super_common(sb, &d, fs->dax_dev, + err = fuse_fill_super_common(sb, &d, d.dax ? fs->dax_dev : NULL, &virtio_fs_fiq_ops, fs, (void **)&fs->vqs[2].fud); if (err < 0) From patchwork Mon Dec 10 17:12:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B95F14E2 for ; Mon, 10 Dec 2018 17:17:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7EC3E2ABC3 for ; Mon, 10 Dec 2018 17:17:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 732222AF3C; Mon, 10 Dec 2018 17:17:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DBBB2ABC3 for ; Mon, 10 Dec 2018 17:17:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728959AbeLJRRd (ORCPT ); Mon, 10 Dec 2018 12:17:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52790 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728373AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D6FFC30842D1; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8CA88605D0; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 5329C22425E; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 20/52] Limit number of pages returned by direct_access() Date: Mon, 10 Dec 2018 12:12:46 -0500 Message-Id: <20181210171318.16998-21-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Truncate number of pages mapped by direct_access() to remain with-in window size. User might request mapping pages beyond window size. Signed-off-by: Vivek Goyal --- fs/fuse/virtio_fs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index e4d5e0cd41ba..ef1469b38a6d 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -449,13 +449,14 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, { struct virtio_fs *fs = dax_get_private(dax_dev); phys_addr_t offset = PFN_PHYS(pgoff); + size_t max_nr_pages = fs->window_len/PAGE_SIZE - pgoff; if (kaddr) *kaddr = fs->window_kaddr + offset; if (pfn) *pfn = phys_to_pfn_t(fs->window_phys_addr + offset, PFN_DEV | PFN_MAP); - return nr_pages; + return nr_pages > max_nr_pages ? max_nr_pages : nr_pages; } static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev, From patchwork Mon Dec 10 17:12:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F273215A6 for ; Mon, 10 Dec 2018 17:18:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D46522AF3C for ; Mon, 10 Dec 2018 17:18:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C880F2AF40; Mon, 10 Dec 2018 17:18:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 865492AF3C for ; Mon, 10 Dec 2018 17:18:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728970AbeLJRRu (ORCPT ); Mon, 10 Dec 2018 12:17:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50516 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728374AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E99C730B8FB2; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9A4016015E; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 564BE22425F; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 21/52] fuse: Introduce fuse_dax_mapping Date: Mon, 10 Dec 2018 12:12:47 -0500 Message-Id: <20181210171318.16998-22-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce fuse_dax_mapping. This type will be used to keep track of per inode dax mappings. Signed-off-by: Vivek Goyal --- fs/fuse/fuse_i.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 345abe9b022f..b9880be690bd 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -81,6 +81,15 @@ struct fuse_forget_link { struct fuse_forget_link *next; }; +/** Translation information for file offsets to DAX window offsets */ +struct fuse_dax_mapping { + /** Position in DAX window */ + u64 window_offset; + + /** Length of mapping, in bytes */ + loff_t length; +}; + /** FUSE inode */ struct fuse_inode { /** Inode data */ From patchwork Mon Dec 10 17:12:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721939 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6F4918E8 for ; Mon, 10 Dec 2018 17:19:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB4242A99A for ; Mon, 10 Dec 2018 17:19:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF5C92A987; Mon, 10 Dec 2018 17:19:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3EE892A987 for ; Mon, 10 Dec 2018 17:19:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728702AbeLJRSe (ORCPT ); Mon, 10 Dec 2018 12:18:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45586 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728375AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0B494284B1; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 99D4E60158; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 5A9CA224260; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 22/52] Create a list of free memory ranges Date: Mon, 10 Dec 2018 12:12:48 -0500 Message-Id: <20181210171318.16998-23-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Divide the dax memory range into fixed size ranges (2MB for now) and put them in a list. This will track free ranges. Once an inode requires a free range, we will take one from here and put it in interval-tree of ranges assigned to inode. Signed-off-by: Vivek Goyal --- fs/fuse/fuse_i.h | 14 +++++++++ fs/fuse/inode.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++- fs/fuse/virtio_fs.c | 2 ++ 3 files changed, 96 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index b9880be690bd..f0775d76e31f 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -46,6 +46,10 @@ /** Number of page pointers embedded in fuse_req */ #define FUSE_REQ_INLINE_PAGES 1 +/* Default memory range size, 2MB */ +#define FUSE_DAX_MEM_RANGE_SZ (2*1024*1024) +#define FUSE_DAX_MEM_RANGE_PAGES (FUSE_DAX_MEM_RANGE_SZ/PAGE_SIZE) + /** List of active connections */ extern struct list_head fuse_conn_list; @@ -83,6 +87,9 @@ struct fuse_forget_link { /** Translation information for file offsets to DAX window offsets */ struct fuse_dax_mapping { + /* Will connect in fc->free_ranges to keep track of free memory */ + struct list_head list; + /** Position in DAX window */ u64 window_offset; @@ -816,6 +823,13 @@ struct fuse_conn { /** DAX device, non-NULL if DAX is supported */ struct dax_device *dax_dev; + + /* + * DAX Window Free Ranges. TODO: This might not be best place to store + * this free list + */ + unsigned long nr_free_ranges; + struct list_head free_ranges; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index d2afce377fd4..403360e352d8 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include MODULE_AUTHOR("Miklos Szeredi "); MODULE_DESCRIPTION("Filesystem in Userspace"); @@ -607,6 +609,69 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq) fpq->connected = 1; } +static void fuse_free_dax_mem_ranges(struct list_head *mem_list) +{ + struct fuse_dax_mapping *range, *temp; + + /* Free All allocated elements */ + list_for_each_entry_safe(range, temp, mem_list, list) { + list_del(&range->list); + kfree(range); + } +} + +static int fuse_dax_mem_range_init(struct fuse_conn *fc, + struct dax_device *dax_dev) +{ + long nr_pages, nr_ranges; + void *kaddr; + pfn_t pfn; + struct fuse_dax_mapping *range; + LIST_HEAD(mem_ranges); + phys_addr_t phys_addr; + int ret = 0, id; + size_t dax_size = -1; + unsigned long allocated_ranges = 0, i; + + id = dax_read_lock(); + nr_pages = dax_direct_access(dax_dev, 0, PHYS_PFN(dax_size), &kaddr, + &pfn); + dax_read_unlock(id); + if (nr_pages < 0) { + pr_debug("dax_direct_access() returned %ld\n", nr_pages); + return nr_pages; + } + + phys_addr = pfn_t_to_phys(pfn); + nr_ranges = nr_pages/FUSE_DAX_MEM_RANGE_PAGES; + printk("fuse_dax_mem_range_init(): dax mapped %ld pages. nr_ranges=%ld\n", nr_pages, nr_ranges); + + for (i = 0; i < nr_ranges; i++) { + range = kzalloc(sizeof(struct fuse_dax_mapping), GFP_KERNEL); + if (!range) { + pr_debug("memory allocation for mem_range failed.\n"); + ret = -ENOMEM; + goto out_err; + } + /* TODO: This offset only works if virtio-fs driver is not + * having some memory hidden at the beginning. This needs + * better handling + */ + range->window_offset = i * FUSE_DAX_MEM_RANGE_SZ; + range->length = FUSE_DAX_MEM_RANGE_SZ; + list_add_tail(&range->list, &mem_ranges); + allocated_ranges++; + } + + list_replace_init(&mem_ranges, &fc->free_ranges); + fc->nr_free_ranges = allocated_ranges; + return 0; +out_err: + /* Free All allocated elements */ + fuse_free_dax_mem_ranges(&mem_ranges); + return ret; +} + void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, struct dax_device *dax_dev, const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) @@ -636,6 +701,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); fc->dax_dev = dax_dev; fc->user_ns = get_user_ns(user_ns); + INIT_LIST_HEAD(&fc->free_ranges); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -644,6 +710,8 @@ void fuse_conn_put(struct fuse_conn *fc) if (refcount_dec_and_test(&fc->count)) { if (fc->destroy_req) fuse_request_free(fc->destroy_req); + if (fc->dax_dev) + fuse_free_dax_mem_ranges(&fc->free_ranges); put_pid_ns(fc->pid_ns); put_user_ns(fc->user_ns); fc->release(fc); @@ -1136,9 +1204,17 @@ int fuse_fill_super_common(struct super_block *sb, fuse_conn_init(fc, sb->s_user_ns, dax_dev, fiq_ops, fiq_priv); fc->release = fuse_free_conn; + if (dax_dev) { + err = fuse_dax_mem_range_init(fc, dax_dev); + if (err) { + pr_debug("fuse_dax_mem_range_init() returned %d\n", err); + goto err_put_conn; + } + } + fud = fuse_dev_alloc(fc); if (!fud) - goto err_put_conn; + goto err_free_ranges; fc->dev = sb->s_dev; fc->sb = sb; @@ -1211,6 +1287,9 @@ int fuse_fill_super_common(struct super_block *sb, dput(root_dentry); err_dev_free: fuse_dev_free(fud); + err_free_ranges: + if (dax_dev) + fuse_free_dax_mem_ranges(&fc->free_ranges); err_put_conn: fuse_conn_put(fc); sb->s_fs_info = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index ef1469b38a6d..c79c9a885253 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -451,6 +451,8 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, phys_addr_t offset = PFN_PHYS(pgoff); size_t max_nr_pages = fs->window_len/PAGE_SIZE - pgoff; + pr_debug("virtio_fs_direct_access(): called. nr_pages=%ld max_nr_pages=%ld\n", nr_pages, max_nr_pages); + if (kaddr) *kaddr = fs->window_kaddr + offset; if (pfn) From patchwork Mon Dec 10 17:12:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721811 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD3A714E2 for ; Mon, 10 Dec 2018 17:15:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9FF142AF02 for ; Mon, 10 Dec 2018 17:15:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 93CBD2AF37; Mon, 10 Dec 2018 17:15:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 171882AF02 for ; Mon, 10 Dec 2018 17:15:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728771AbeLJRO7 (ORCPT ); Mon, 10 Dec 2018 12:14:59 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33838 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728547AbeLJRNn (ORCPT ); Mon, 10 Dec 2018 12:13:43 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 38D4C308212C; Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9D2465C1B5; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 5DD8E224261; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 23/52] fuse: simplify fuse_fill_super_common() calling Date: Mon, 10 Dec 2018 12:12:49 -0500 Message-Id: <20181210171318.16998-24-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Add more fields to "struct fuse_mount_data" so that less parameters have to be passed to function fuse_fill_super_common(). Signed-off-by: Miklos Szeredi --- fs/fuse/fuse_i.h | 22 +++++++++++++--------- fs/fuse/inode.c | 27 ++++++++++++++------------- fs/fuse/virtio_fs.c | 10 +++++++--- 3 files changed, 34 insertions(+), 25 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f0775d76e31f..fb49ca9d05ac 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -77,6 +77,18 @@ struct fuse_mount_data { unsigned dax:1; unsigned max_read; unsigned blksize; + + /* DAX device, may be NULL */ + struct dax_device *dax_dev; + + /* fuse input queue operations */ + const struct fuse_iqueue_ops *fiq_ops; + + /* device-specific state for fuse_iqueue */ + void *fiq_priv; + + /* fuse_dev pointer to fill in, should contain NULL on entry */ + void **fudptr; }; /* One forget request */ @@ -1073,17 +1085,9 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, * Fill in superblock and initialize fuse connection * @sb: partially-initialized superblock to fill in * @mount_data: mount parameters - * @dax_dev: DAX device, may be NULL - * @fiq_ops: fuse input queue operations - * @fiq_priv: device-specific state for fuse_iqueue - * @fudptr: fuse_dev pointer to fill in, should contain NULL on entry */ int fuse_fill_super_common(struct super_block *sb, - struct fuse_mount_data *mount_data, - struct dax_device *dax_dev, - const struct fuse_iqueue_ops *fiq_ops, - void *fiq_priv, - void **fudptr); + struct fuse_mount_data *mount_data); /** * Disassociate fuse connection from superblock and kill the superblock diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 403360e352d8..075997977cfd 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1149,11 +1149,7 @@ void fuse_dev_free(struct fuse_dev *fud) EXPORT_SYMBOL_GPL(fuse_dev_free); int fuse_fill_super_common(struct super_block *sb, - struct fuse_mount_data *mount_data, - struct dax_device *dax_dev, - const struct fuse_iqueue_ops *fiq_ops, - void *fiq_priv, - void **fudptr) + struct fuse_mount_data *mount_data) { struct fuse_dev *fud; struct fuse_conn *fc; @@ -1201,11 +1197,12 @@ int fuse_fill_super_common(struct super_block *sb, if (!fc) goto err; - fuse_conn_init(fc, sb->s_user_ns, dax_dev, fiq_ops, fiq_priv); + fuse_conn_init(fc, sb->s_user_ns, mount_data->dax_dev, + mount_data->fiq_ops, mount_data->fiq_priv); fc->release = fuse_free_conn; - if (dax_dev) { - err = fuse_dax_mem_range_init(fc, dax_dev); + if (mount_data->dax_dev) { + err = fuse_dax_mem_range_init(fc, mount_data->dax_dev); if (err) { pr_debug("fuse_dax_mem_range_init() returned %d\n", err); goto err_put_conn; @@ -1259,7 +1256,7 @@ int fuse_fill_super_common(struct super_block *sb, mutex_lock(&fuse_mutex); err = -EINVAL; - if (*fudptr) + if (*mount_data->fudptr) goto err_unlock; err = fuse_ctl_add_conn(fc); @@ -1268,7 +1265,7 @@ int fuse_fill_super_common(struct super_block *sb, list_add_tail(&fc->entry, &fuse_conn_list); sb->s_root = root_dentry; - *fudptr = fud; + *mount_data->fudptr = fud; /* * mutex_unlock() provides the necessary memory barrier for * *fudptr to be visible on all CPUs after this @@ -1288,7 +1285,7 @@ int fuse_fill_super_common(struct super_block *sb, err_dev_free: fuse_dev_free(fud); err_free_ranges: - if (dax_dev) + if (mount_data->dax_dev) fuse_free_dax_mem_ranges(&fc->free_ranges); err_put_conn: fuse_conn_put(fc); @@ -1323,8 +1320,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) (file->f_cred->user_ns != sb->s_user_ns)) goto err_fput; - err = fuse_fill_super_common(sb, &d, NULL, &fuse_dev_fiq_ops, NULL, - &file->private_data); + d.dax_dev = NULL; + d.fiq_ops = &fuse_dev_fiq_ops; + d.fiq_priv = NULL; + d.fudptr = &file->private_data; + err = fuse_fill_super_common(sb, &d); + err_fput: fput(file); err: diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index c79c9a885253..98dba3cf9d40 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1053,9 +1053,13 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, /* TODO this sends FUSE_INIT and could cause hiprio or notifications * virtqueue races since they haven't been set up yet! */ - err = fuse_fill_super_common(sb, &d, d.dax ? fs->dax_dev : NULL, - &virtio_fs_fiq_ops, fs, - (void **)&fs->vqs[2].fud); + + d.dax_dev = d.dax ? fs->dax_dev : NULL; + d.fiq_ops = &virtio_fs_fiq_ops; + d.fiq_priv = fs; + d.fudptr = (void **)&fs->vqs[2].fud; + err = fuse_fill_super_common(sb, &d); + if (err < 0) goto err_fud; From patchwork Mon Dec 10 17:12:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721793 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAB6C13AF for ; Mon, 10 Dec 2018 17:14:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C06362AEB6 for ; Mon, 10 Dec 2018 17:14:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B46712AF02; Mon, 10 Dec 2018 17:14:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98B922AE9D for ; Mon, 10 Dec 2018 17:14:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728153AbeLJROJ (ORCPT ); Mon, 10 Dec 2018 12:14:09 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39829 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728655AbeLJRNv (ORCPT ); Mon, 10 Dec 2018 12:13:51 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 004D8307D96B; Mon, 10 Dec 2018 17:13:51 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 02E2C18501; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 658F4224263; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 24/52] fuse: Introduce setupmapping/removemapping commands Date: Mon, 10 Dec 2018 12:12:50 -0500 Message-Id: <20181210171318.16998-25-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Mon, 10 Dec 2018 17:13:51 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce two new fuse commands to setup/remove memory mappings. Signed-off-by: Vivek Goyal --- include/uapi/linux/fuse.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index b4967d48bfda..867fdafc4a5e 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -394,6 +394,8 @@ enum fuse_opcode { FUSE_RENAME2 = 45, FUSE_LSEEK = 46, FUSE_COPY_FILE_RANGE = 47, + FUSE_SETUPMAPPING = 48, + FUSE_REMOVEMAPPING = 49, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -817,4 +819,35 @@ struct fuse_copy_file_range_in { uint64_t flags; }; +#define FUSE_SETUPMAPPING_ENTRIES 8 +#define FUSE_SETUPMAPPING_FLAG_WRITE (1ull << 0) +struct fuse_setupmapping_in { + /* An already open handle */ + uint64_t fh; + /* Offset into the file to start the mapping */ + uint64_t foffset; + /* Length of mapping required */ + uint64_t len; + /* Flags, FUSE_SETUPMAPPING_FLAG_* */ + uint64_t flags; + /* Offset in Memory Window */ + uint64_t moffset; +}; + +struct fuse_setupmapping_out { + /* Offsets into the cache of mappings */ + uint64_t coffset[FUSE_SETUPMAPPING_ENTRIES]; + /* Lengths of each mapping */ + uint64_t len[FUSE_SETUPMAPPING_ENTRIES]; +}; + +struct fuse_removemapping_in { + /* An already open handle */ + uint64_t fh; + /* Offset into the dax window start the unmapping */ + uint64_t moffset; + /* Length of mapping required */ + uint64_t len; +}; + #endif /* _LINUX_FUSE_H */ From patchwork Mon Dec 10 17:12:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721953 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AC13679F for ; Mon, 10 Dec 2018 17:19:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EB662AA7B for ; Mon, 10 Dec 2018 17:19:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 530542AA7A; Mon, 10 Dec 2018 17:19:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 040422AA6D for ; Mon, 10 Dec 2018 17:19:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729095AbeLJRTi (ORCPT ); Mon, 10 Dec 2018 12:19:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55328 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728382AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2EDE6804EE; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id BC10E1001914; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 6A980224264; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 25/52] Introduce interval tree basic data structures Date: Mon, 10 Dec 2018 12:12:51 -0500 Message-Id: <20181210171318.16998-26-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We want to use interval tree to keep track of per inode dax mappings. Introduce basic data structures. Signed-off-by: Vivek Goyal --- fs/fuse/fuse_i.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index fb49ca9d05ac..a24f31156b47 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -97,11 +97,22 @@ struct fuse_forget_link { struct fuse_forget_link *next; }; +#define START(node) ((node)->start) +#define LAST(node) ((node)->end) + /** Translation information for file offsets to DAX window offsets */ struct fuse_dax_mapping { /* Will connect in fc->free_ranges to keep track of free memory */ struct list_head list; + /* For interval tree in file/inode */ + struct rb_node rb; + /** Start Position in file */ + __u64 start; + /** End Position in file */ + __u64 end; + __u64 __subtree_last; + /** Position in DAX window */ u64 window_offset; @@ -191,6 +202,10 @@ struct fuse_inode { /** Lock for serializing lookup and readdir for back compatibility*/ struct mutex mutex; + + /** Sorted rb tree of struct fuse_dax_mapping elements */ + struct rb_root_cached dmap_tree; + unsigned long nr_dmaps; }; /** FUSE inode state bits */ From patchwork Mon Dec 10 17:12:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA34714E2 for ; Mon, 10 Dec 2018 17:18:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFC362AF3D for ; Mon, 10 Dec 2018 17:18:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A377A2AF3C; Mon, 10 Dec 2018 17:18:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A837E2AF3D for ; Mon, 10 Dec 2018 17:18:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728989AbeLJRRw (ORCPT ); Mon, 10 Dec 2018 12:17:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58426 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728381AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 30A4030024E0; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id CA2F45C7B4; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 6E9F8224265; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 26/52] fuse: Implement basic DAX read/write support commands Date: Mon, 10 Dec 2018 12:12:52 -0500 Message-Id: <20181210171318.16998-27-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi This patch implements basic DAX support. mmap() is not implemented yet and will come in later patches. This patch looks into implemeting read/write. Signed-off-by: Stefan Hajnoczi Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 400 ++++++++++++++++++++++++++++++++++++++++++++++ fs/fuse/fuse_i.h | 6 + fs/fuse/inode.c | 6 + include/uapi/linux/fuse.h | 1 + 4 files changed, 413 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index b52f9baaa3e7..449a6b315327 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -18,9 +18,16 @@ #include #include #include +#include +#include +#include static const struct file_operations fuse_direct_io_file_operations; +INTERVAL_TREE_DEFINE(struct fuse_dax_mapping, + rb, __u64, __subtree_last, + START, LAST, static inline, fuse_dax_interval_tree); + static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file, int opcode, struct fuse_open_out *outargp) { @@ -172,6 +179,171 @@ static void fuse_link_write_file(struct file *file) spin_unlock(&fc->lock); } +static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) +{ + struct fuse_dax_mapping *dmap = NULL; + + spin_lock(&fc->lock); + + /* TODO: Add logic to try to free up memory if wait is allowed */ + if (fc->nr_free_ranges <= 0) { + spin_unlock(&fc->lock); + return NULL; + } + + WARN_ON(list_empty(&fc->free_ranges)); + + /* Take a free range */ + dmap = list_first_entry(&fc->free_ranges, struct fuse_dax_mapping, + list); + list_del_init(&dmap->list); + fc->nr_free_ranges--; + spin_unlock(&fc->lock); + return dmap; +} + +/* This assumes fc->lock is held */ +static void __free_dax_mapping(struct fuse_conn *fc, + struct fuse_dax_mapping *dmap) +{ + list_add_tail(&dmap->list, &fc->free_ranges); + fc->nr_free_ranges++; +} + +static void free_dax_mapping(struct fuse_conn *fc, + struct fuse_dax_mapping *dmap) +{ + /* Return fuse_dax_mapping to free list */ + spin_lock(&fc->lock); + __free_dax_mapping(fc, dmap); + spin_unlock(&fc->lock); +} + +/* offset passed in should be aligned to FUSE_DAX_MEM_RANGE_SZ */ +static int fuse_setup_one_mapping(struct inode *inode, + struct file *file, loff_t offset, + struct fuse_dax_mapping *dmap) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_file *ff = NULL; + struct fuse_setupmapping_in inarg; + FUSE_ARGS(args); + ssize_t err; + + if (file) + ff = file->private_data; + + WARN_ON(offset % FUSE_DAX_MEM_RANGE_SZ); + WARN_ON(fc->nr_free_ranges < 0); + + /* Ask fuse daemon to setup mapping */ + memset(&inarg, 0, sizeof(inarg)); + inarg.foffset = offset; + if (ff) + inarg.fh = ff->fh; + else + inarg.fh = -1; + inarg.moffset = dmap->window_offset; + inarg.len = FUSE_DAX_MEM_RANGE_SZ; + if (file) { + inarg.flags |= (file->f_mode & FMODE_WRITE) ? + FUSE_SETUPMAPPING_FLAG_WRITE : 0; + inarg.flags |= (file->f_mode & FMODE_READ) ? + FUSE_SETUPMAPPING_FLAG_READ : 0; + } else { + inarg.flags |= FUSE_SETUPMAPPING_FLAG_READ; + inarg.flags |= FUSE_SETUPMAPPING_FLAG_WRITE; + } + args.in.h.opcode = FUSE_SETUPMAPPING; + args.in.h.nodeid = fi->nodeid; + args.in.numargs = 1; + args.in.args[0].size = sizeof(inarg); + args.in.args[0].value = &inarg; + err = fuse_simple_request(fc, &args); + if (err < 0) { + printk(KERN_ERR "%s request failed at mem_offset=0x%llx %zd\n", + __func__, dmap->window_offset, err); + return err; + } + + pr_debug("fuse_setup_one_mapping() succeeded. offset=0x%llx err=%zd\n", offset, err); + + /* TODO: What locking is required here. For now, using fc->lock */ + dmap->start = offset; + dmap->end = offset + FUSE_DAX_MEM_RANGE_SZ - 1; + /* Protected by fi->i_dmap_sem */ + fuse_dax_interval_tree_insert(dmap, &fi->dmap_tree); + fi->nr_dmaps++; + return 0; +} + +static int fuse_removemapping_one(struct inode *inode, + struct fuse_dax_mapping *dmap) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_removemapping_in inarg; + FUSE_ARGS(args); + ssize_t err = 0; + + memset(&inarg, 0, sizeof(inarg)); + inarg.moffset = dmap->window_offset; + inarg.len = dmap->length; + args.in.h.opcode = FUSE_REMOVEMAPPING; + args.in.h.nodeid = fi->nodeid; + args.in.numargs = 1; + args.in.args[0].size = sizeof(inarg); + args.in.args[0].value = &inarg; + err = fuse_simple_request(fc, &args); + if (err < 0) { + printk(KERN_ERR "%s request failed %zd\n", __func__, err); + return err; + } + pr_debug("%s request succeeded\n", __func__); + return 0; +} + +void fuse_removemapping(struct inode *inode) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + ssize_t err; + struct fuse_dax_mapping *dmap; + + down_write(&fi->i_dmap_sem); + + /* Clear the mappings list */ + while (true) { + WARN_ON(fi->nr_dmaps < 0); + + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, 0, + -1); + if (dmap) { + fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); + fi->nr_dmaps--; + } + + if (!dmap) + break; + + err = fuse_removemapping_one(inode, dmap); + if (err) { + /* TODO: Add it back to tree. */ + printk("Failed to removemapping. offset=0x%llx" + " len=0x%llx\n", dmap->window_offset, + dmap->length); + continue; + } + + /* Add it back to free ranges list */ + free_dax_mapping(fc, dmap); + } + + up_write(&fi->i_dmap_sem); + pr_debug("%s request succeeded\n", __func__); +} + void fuse_finish_open(struct inode *inode, struct file *file) { struct fuse_file *ff = file->private_data; @@ -1452,6 +1624,204 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) return res; } +static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length) +{ + iomap->addr = IOMAP_NULL_ADDR; + iomap->length = length; + iomap->type = IOMAP_HOLE; +} + +static void fuse_fill_iomap(struct inode *inode, loff_t pos, loff_t length, + struct iomap *iomap, struct fuse_dax_mapping *dmap, + unsigned flags) +{ + loff_t offset, len; + loff_t i_size = i_size_read(inode); + + offset = pos - dmap->start; + len = min(length, dmap->length - offset); + + /* If length is beyond end of file, truncate further */ + if (pos + len > i_size) + len = i_size - pos; + + if (len > 0) { + iomap->addr = dmap->window_offset + offset; + iomap->length = len; + if (flags & IOMAP_FAULT) + iomap->length = ALIGN(len, PAGE_SIZE); + iomap->type = IOMAP_MAPPED; + pr_debug("%s: returns iomap: addr 0x%llx offset 0x%llx" + " length 0x%llx\n", __func__, iomap->addr, + iomap->offset, iomap->length); + } else { + /* Mapping beyond end of file is hole */ + fuse_fill_iomap_hole(iomap, length); + pr_debug("%s: returns iomap: addr 0x%llx offset 0x%llx" + "length 0x%llx\n", __func__, iomap->addr, + iomap->offset, iomap->length); + } +} + +/* This is just for DAX and the mapping is ephemeral, do not use it for other + * purposes since there is no block device with a permanent mapping. + */ +static int fuse_iomap_begin(struct inode *inode, loff_t pos, loff_t length, + unsigned flags, struct iomap *iomap) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_dax_mapping *dmap, *alloc_dmap = NULL; + int ret; + + /* We don't support FIEMAP */ + BUG_ON(flags & IOMAP_REPORT); + + pr_debug("fuse_iomap_begin() called. pos=0x%llx length=0x%llx\n", + pos, length); + + iomap->offset = pos; + iomap->flags = 0; + iomap->bdev = NULL; + iomap->dax_dev = fc->dax_dev; + + /* + * Both read/write and mmap path can race here. So we need something + * to make sure if we are setting up mapping, then other path waits + * + * For now, use a semaphore for this. It probably needs to be + * optimized later. + */ + down_read(&fi->i_dmap_sem); + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, pos, pos); + + if (dmap) { + fuse_fill_iomap(inode, pos, length, iomap, dmap, flags); + up_read(&fi->i_dmap_sem); + return 0; + } else { + up_read(&fi->i_dmap_sem); + pr_debug("%s: no mapping at offset 0x%llx length 0x%llx\n", + __func__, pos, length); + if (pos >= i_size_read(inode)) + goto iomap_hole; + + alloc_dmap = alloc_dax_mapping(fc); + if (!alloc_dmap) + return -EBUSY; + + /* + * Drop read lock and take write lock so that only one + * caller can try to setup mapping and other waits + */ + down_write(&fi->i_dmap_sem); + /* + * We dropped lock. Check again if somebody else setup + * mapping already. + */ + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, pos, + pos); + if (dmap) { + fuse_fill_iomap(inode, pos, length, iomap, dmap, flags); + free_dax_mapping(fc, alloc_dmap); + up_write(&fi->i_dmap_sem); + return 0; + } + + /* Setup one mapping */ + ret = fuse_setup_one_mapping(inode, NULL, + ALIGN_DOWN(pos, FUSE_DAX_MEM_RANGE_SZ), + alloc_dmap); + if (ret < 0) { + printk("fuse_setup_one_mapping() failed. err=%d" + " pos=0x%llx\n", ret, pos); + free_dax_mapping(fc, alloc_dmap); + up_write(&fi->i_dmap_sem); + return ret; + } + fuse_fill_iomap(inode, pos, length, iomap, alloc_dmap, flags); + up_write(&fi->i_dmap_sem); + return 0; + } + + /* + * If read beyond end of file happnes, fs code seems to return + * it as hole + */ +iomap_hole: + fuse_fill_iomap_hole(iomap, length); + pr_debug("fuse_iomap_begin() returning hole mapping. pos=0x%llx length_asked=0x%llx length_returned=0x%llx\n", pos, length, iomap->length); + return 0; +} + +static int fuse_iomap_end(struct inode *inode, loff_t pos, loff_t length, + ssize_t written, unsigned flags, + struct iomap *iomap) +{ + /* DAX writes beyond end-of-file aren't handled using iomap, so the + * file size is unchanged and there is nothing to do here. + */ + return 0; +} + +static const struct iomap_ops fuse_iomap_ops = { + .iomap_begin = fuse_iomap_begin, + .iomap_end = fuse_iomap_end, +}; + +static ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct inode *inode = file_inode(iocb->ki_filp); + ssize_t ret; + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock_shared(inode)) + return -EAGAIN; + } else { + inode_lock_shared(inode); + } + + ret = dax_iomap_rw(iocb, to, &fuse_iomap_ops); + inode_unlock_shared(inode); + + /* TODO file_accessed(iocb->f_filp) */ + + return ret; +} + +static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct inode *inode = file_inode(iocb->ki_filp); + ssize_t ret; + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock(inode)) + return -EAGAIN; + } else { + inode_lock(inode); + } + + ret = generic_write_checks(iocb, from); + if (ret <= 0) + goto out; + + ret = file_remove_privs(iocb->ki_filp); + if (ret) + goto out; + /* TODO file_update_time() but we don't want metadata I/O */ + + /* TODO handle growing the file */ + + ret = dax_iomap_rw(iocb, from, &fuse_iomap_ops); + +out: + inode_unlock(inode); + + if (ret > 0) + ret = generic_write_sync(iocb, ret); + return ret; +} + static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req) { int i; @@ -2104,6 +2474,11 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma) return generic_file_mmap(file, vma); } +static int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma) +{ + return -EINVAL; /* TODO */ +} + static int convert_fuse_file_lock(struct fuse_conn *fc, const struct fuse_file_lock *ffl, struct file_lock *fl) @@ -3137,6 +3512,24 @@ static const struct file_operations fuse_direct_io_file_operations = { /* no splice_read */ }; +static const struct file_operations fuse_dax_file_operations = { + .llseek = fuse_file_llseek, + .read_iter = fuse_dax_read_iter, + .write_iter = fuse_dax_write_iter, + .mmap = fuse_dax_mmap, + .open = fuse_open, + .flush = fuse_flush, + .release = fuse_release, + .fsync = fuse_fsync, + .lock = fuse_file_lock, + .flock = fuse_file_flock, + .unlocked_ioctl = fuse_file_ioctl, + .compat_ioctl = fuse_file_compat_ioctl, + .poll = fuse_file_poll, + .fallocate = fuse_file_fallocate, + /* no splice_read */ +}; + static const struct address_space_operations fuse_file_aops = { .readpage = fuse_readpage, .writepage = fuse_writepage, @@ -3153,6 +3546,7 @@ static const struct address_space_operations fuse_file_aops = { void fuse_init_file_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops; @@ -3162,4 +3556,10 @@ void fuse_init_file_inode(struct inode *inode) fi->writectr = 0; init_waitqueue_head(&fi->page_waitq); INIT_LIST_HEAD(&fi->writepages); + fi->dmap_tree = RB_ROOT_CACHED; + + if (fc->dax_dev) { + inode->i_flags |= S_DAX; + inode->i_fop = &fuse_dax_file_operations; + } } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index a24f31156b47..3b17fb336256 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -203,6 +203,11 @@ struct fuse_inode { /** Lock for serializing lookup and readdir for back compatibility*/ struct mutex mutex; + /* + * Semaphore to protect modifications to dmap_tree + */ + struct rw_semaphore i_dmap_sem; + /** Sorted rb tree of struct fuse_dax_mapping elements */ struct rb_root_cached dmap_tree; unsigned long nr_dmaps; @@ -1225,5 +1230,6 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); * Get the next unique ID for a request */ u64 fuse_get_unique(struct fuse_iqueue *fiq); +void fuse_removemapping(struct inode *inode); #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 075997977cfd..56310d10cd4c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -83,7 +83,9 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->attr_version = 0; fi->orig_ino = 0; fi->state = 0; + fi->nr_dmaps = 0; mutex_init(&fi->mutex); + init_rwsem(&fi->i_dmap_sem); fi->forget = fuse_alloc_forget(); if (!fi->forget) { kmem_cache_free(fuse_inode_cachep, inode); @@ -118,6 +120,10 @@ static void fuse_evict_inode(struct inode *inode) if (inode->i_sb->s_flags & SB_ACTIVE) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); + if (IS_DAX(inode)) { + fuse_removemapping(inode); + WARN_ON(fi->nr_dmaps); + } fuse_queue_forget(fc, fi->forget, fi->nodeid, fi->nlookup); fi->forget = NULL; } diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 867fdafc4a5e..1657253cb7d6 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -821,6 +821,7 @@ struct fuse_copy_file_range_in { #define FUSE_SETUPMAPPING_ENTRIES 8 #define FUSE_SETUPMAPPING_FLAG_WRITE (1ull << 0) +#define FUSE_SETUPMAPPING_FLAG_READ (1ull << 1) struct fuse_setupmapping_in { /* An already open handle */ uint64_t fh; From patchwork Mon Dec 10 17:12:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 419BA14E2 for ; Mon, 10 Dec 2018 17:19:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2486A2AA2B for ; Mon, 10 Dec 2018 17:19:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 192082AA7A; Mon, 10 Dec 2018 17:19:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86C6A2AA2B for ; Mon, 10 Dec 2018 17:19:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728418AbeLJRTh (ORCPT ); Mon, 10 Dec 2018 12:19:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40774 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728380AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1A02F13AA6; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id C7AB55C7AA; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 7259B224266; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 27/52] fuse: Maintain a list of busy elements Date: Mon, 10 Dec 2018 12:12:53 -0500 Message-Id: <20181210171318.16998-28-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This list will be used selecting fuse_dax_mapping to free when number of free mappings drops below a threshold. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 8 ++++++++ fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 4 ++++ 3 files changed, 19 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 449a6b315327..94ad76382a6f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -275,6 +275,10 @@ static int fuse_setup_one_mapping(struct inode *inode, /* Protected by fi->i_dmap_sem */ fuse_dax_interval_tree_insert(dmap, &fi->dmap_tree); fi->nr_dmaps++; + spin_lock(&fc->lock); + list_add_tail(&dmap->busy_list, &fc->busy_ranges); + fc->nr_busy_ranges++; + spin_unlock(&fc->lock); return 0; } @@ -322,6 +326,10 @@ void fuse_removemapping(struct inode *inode) if (dmap) { fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); fi->nr_dmaps--; + spin_lock(&fc->lock); + list_del_init(&dmap->busy_list); + fc->nr_busy_ranges--; + spin_unlock(&fc->lock); } if (!dmap) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 3b17fb336256..e32b0059493b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -113,6 +113,9 @@ struct fuse_dax_mapping { __u64 end; __u64 __subtree_last; + /* Will connect in fc->busy_ranges to keep track busy memory */ + struct list_head busy_list; + /** Position in DAX window */ u64 window_offset; @@ -856,6 +859,10 @@ struct fuse_conn { /** DAX device, non-NULL if DAX is supported */ struct dax_device *dax_dev; + /* List of memory ranges which are busy */ + unsigned long nr_busy_ranges; + struct list_head busy_ranges; + /* * DAX Window Free Ranges. TODO: This might not be best place to store * this free list diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 56310d10cd4c..234b9c0c80ab 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -622,6 +622,8 @@ static void fuse_free_dax_mem_ranges(struct list_head *mem_list) /* Free All allocated elements */ list_for_each_entry_safe(range, temp, mem_list, list) { list_del(&range->list); + if (!list_empty(&range->busy_list)) + list_del(&range->busy_list); kfree(range); } } @@ -666,6 +668,7 @@ static int fuse_dax_mem_range_init(struct fuse_conn *fc, range->window_offset = i * FUSE_DAX_MEM_RANGE_SZ; range->length = FUSE_DAX_MEM_RANGE_SZ; list_add_tail(&range->list, &mem_ranges); + INIT_LIST_HEAD(&range->busy_list); allocated_ranges++; } @@ -708,6 +711,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->dax_dev = dax_dev; fc->user_ns = get_user_ns(user_ns); INIT_LIST_HEAD(&fc->free_ranges); + INIT_LIST_HEAD(&fc->busy_ranges); } EXPORT_SYMBOL_GPL(fuse_conn_init); From patchwork Mon Dec 10 17:12:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 457F618E8 for ; Mon, 10 Dec 2018 17:19:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 299022AA69 for ; Mon, 10 Dec 2018 17:19:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DFB42AA7B; Mon, 10 Dec 2018 17:19:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A69A32AA69 for ; Mon, 10 Dec 2018 17:19:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727283AbeLJRTL (ORCPT ); Mon, 10 Dec 2018 12:19:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37000 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728379AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2FB8E3154866; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id C72D2605CF; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 76162224267; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 28/52] Do fallocate() to grow file before mapping for file growing writes Date: Mon, 10 Dec 2018 12:12:54 -0500 Message-Id: <20181210171318.16998-29-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP How to handle file growing writes. For now, this patch does fallocate() to grow file and then map it using dax. We need to figure out what's the best way to handle it. This patch does fallocate() and setup mapping operations in fuse_dax_write_iter(), instead of iomap_begin(). I don't have access to file pointer needed to send a message to fuse daemon in iomap_begin(). Dave Chinner has expressed concers with this approach as this is not atomic. If guest crashes after falloc() but before data was written, user will think that filesystem lost its data. So this is still an outstanding issue. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 71 +++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 55 insertions(+), 16 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 94ad76382a6f..41d773ba2c72 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -28,6 +28,9 @@ INTERVAL_TREE_DEFINE(struct fuse_dax_mapping, rb, __u64, __subtree_last, START, LAST, static inline, fuse_dax_interval_tree); +static long __fuse_file_fallocate(struct file *file, int mode, + loff_t offset, loff_t length); + static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file, int opcode, struct fuse_open_out *outargp) { @@ -1819,6 +1822,22 @@ static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) /* TODO file_update_time() but we don't want metadata I/O */ /* TODO handle growing the file */ + /* Grow file here if need be. iomap_begin() does not have access + * to file pointer + */ + if (iov_iter_rw(from) == WRITE && + ((iocb->ki_pos + iov_iter_count(from)) > i_size_read(inode))) { + ret = __fuse_file_fallocate(iocb->ki_filp, 0, iocb->ki_pos, + iov_iter_count(from)); + if (ret < 0) { + printk("fallocate(offset=0x%llx length=0x%lx)" + " failed. err=%ld\n", iocb->ki_pos, + iov_iter_count(from), ret); + goto out; + } + pr_debug("fallocate(offset=0x%llx length=0x%lx)" + " succeed. ret=%ld\n", iocb->ki_pos, iov_iter_count(from), ret); + } ret = dax_iomap_rw(iocb, from, &fuse_iomap_ops); @@ -3331,8 +3350,12 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return ret; } -static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, - loff_t length) +/* + * This variant does not take any inode lock and if locking is required, + * caller is supposed to hold lock + */ +static long __fuse_file_fallocate(struct file *file, int mode, + loff_t offset, loff_t length) { struct fuse_file *ff = file->private_data; struct inode *inode = file_inode(file); @@ -3346,8 +3369,6 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, .mode = mode }; int err; - bool lock_inode = !(mode & FALLOC_FL_KEEP_SIZE) || - (mode & FALLOC_FL_PUNCH_HOLE); if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) return -EOPNOTSUPP; @@ -3355,17 +3376,13 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, if (fc->no_fallocate) return -EOPNOTSUPP; - if (lock_inode) { - inode_lock(inode); - if (mode & FALLOC_FL_PUNCH_HOLE) { - loff_t endbyte = offset + length - 1; - err = filemap_write_and_wait_range(inode->i_mapping, - offset, endbyte); - if (err) - goto out; - - fuse_sync_writes(inode); - } + if (mode & FALLOC_FL_PUNCH_HOLE) { + loff_t endbyte = offset + length - 1; + err = filemap_write_and_wait_range(inode->i_mapping, offset, + endbyte); + if (err) + goto out; + fuse_sync_writes(inode); } if (!(mode & FALLOC_FL_KEEP_SIZE)) @@ -3401,9 +3418,31 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, if (!(mode & FALLOC_FL_KEEP_SIZE)) clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); + return err; +} + +static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, + loff_t length) +{ + struct fuse_file *ff = file->private_data; + struct inode *inode = file_inode(file); + struct fuse_conn *fc = ff->fc; + int err; + bool lock_inode = !(mode & FALLOC_FL_KEEP_SIZE) || + (mode & FALLOC_FL_PUNCH_HOLE); + + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) + return -EOPNOTSUPP; + + if (fc->no_fallocate) + return -EOPNOTSUPP; + if (lock_inode) - inode_unlock(inode); + inode_lock(inode); + err = __fuse_file_fallocate(file, mode, offset, length); + if (lock_inode) + inode_unlock(inode); return err; } From patchwork Mon Dec 10 17:12:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721945 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B09314E2 for ; Mon, 10 Dec 2018 17:19:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F3F02AA69 for ; Mon, 10 Dec 2018 17:19:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 63CA02AA7B; Mon, 10 Dec 2018 17:19:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1161F2AA69 for ; Mon, 10 Dec 2018 17:19:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727773AbeLJRTL (ORCPT ); Mon, 10 Dec 2018 12:19:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36998 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728377AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 331E93164672; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 033061054FD2; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 79352224268; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 29/52] fuse: add DAX mmap support Date: Mon, 10 Dec 2018 12:12:55 -0500 Message-Id: <20181210171318.16998-30-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Add DAX mmap() support. Signed-off-by: Stefan Hajnoczi --- fs/fuse/file.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 41d773ba2c72..5230f2d84a14 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2501,9 +2501,65 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma) return generic_file_mmap(file, vma); } +static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, + bool write) +{ + int ret; + struct inode *inode = file_inode(vmf->vma->vm_file); + struct super_block *sb = inode->i_sb; + pfn_t pfn; + + if (write) + sb_start_pagefault(sb); + + /* TODO inode semaphore to protect faults vs truncate */ + + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); + + if (ret & VM_FAULT_NEEDDSYNC) + ret = dax_finish_sync_fault(vmf, pe_size, pfn); + + if (write) + sb_end_pagefault(sb); + + return ret; +} + +static int fuse_dax_fault(struct vm_fault *vmf) +{ + return __fuse_dax_fault(vmf, PE_SIZE_PTE, + vmf->flags & FAULT_FLAG_WRITE); +} + +static int fuse_dax_huge_fault(struct vm_fault *vmf, + enum page_entry_size pe_size) +{ + return __fuse_dax_fault(vmf, pe_size, vmf->flags & FAULT_FLAG_WRITE); +} + +static int fuse_dax_page_mkwrite(struct vm_fault *vmf) +{ + return __fuse_dax_fault(vmf, PE_SIZE_PTE, true); +} + +static int fuse_dax_pfn_mkwrite(struct vm_fault *vmf) +{ + return __fuse_dax_fault(vmf, PE_SIZE_PTE, true); +} + +static const struct vm_operations_struct fuse_dax_vm_ops = { + .fault = fuse_dax_fault, + .huge_fault = fuse_dax_huge_fault, + .page_mkwrite = fuse_dax_page_mkwrite, + .pfn_mkwrite = fuse_dax_pfn_mkwrite, +}; + static int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma) { - return -EINVAL; /* TODO */ + file_accessed(file); + vma->vm_ops = &fuse_dax_vm_ops; + vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; + return 0; } static int convert_fuse_file_lock(struct fuse_conn *fc, From patchwork Mon Dec 10 17:12:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721791 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89C1314E2 for ; Mon, 10 Dec 2018 17:14:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71CBA2AE9D for ; Mon, 10 Dec 2018 17:14:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6531B2AF02; Mon, 10 Dec 2018 17:14:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 150BF2AE9D for ; Mon, 10 Dec 2018 17:14:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728735AbeLJROJ (ORCPT ); Mon, 10 Dec 2018 12:14:09 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45268 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728654AbeLJRNv (ORCPT ); Mon, 10 Dec 2018 12:13:51 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F16F71B98; Mon, 10 Dec 2018 17:13:50 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 02D695D962; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 7E166224269; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 30/52] fuse: delete dentry if timeout is zero Date: Mon, 10 Dec 2018 12:12:56 -0500 Message-Id: <20181210171318.16998-31-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 10 Dec 2018 17:13:51 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Don't hold onto dentry in lru list if need to re-lookup it anyway at next access. More advanced version of this patch would periodically flush out dentries from the lru which have gone stale. Signed-off-by: Miklos Szeredi --- fs/fuse/dir.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 47395b0c3b35..b7e6e421f6bb 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -29,12 +29,26 @@ union fuse_dentry { struct rcu_head rcu; }; -static inline void fuse_dentry_settime(struct dentry *entry, u64 time) +static void fuse_dentry_settime(struct dentry *dentry, u64 time) { - ((union fuse_dentry *) entry->d_fsdata)->time = time; + /* + * Mess with DCACHE_OP_DELETE because dput() will be faster without it. + * Don't care about races, either way it's just an optimization + */ + if ((time && (dentry->d_flags & DCACHE_OP_DELETE)) || + (!time && !(dentry->d_flags & DCACHE_OP_DELETE))) { + spin_lock(&dentry->d_lock); + if (time) + dentry->d_flags &= ~DCACHE_OP_DELETE; + else + dentry->d_flags |= DCACHE_OP_DELETE; + spin_unlock(&dentry->d_lock); + } + + ((union fuse_dentry *) dentry->d_fsdata)->time = time; } -static inline u64 fuse_dentry_time(struct dentry *entry) +static inline u64 fuse_dentry_time(const struct dentry *entry) { return ((union fuse_dentry *) entry->d_fsdata)->time; } @@ -270,8 +284,14 @@ static void fuse_dentry_release(struct dentry *dentry) kfree_rcu(fd, rcu); } +static int fuse_dentry_delete(const struct dentry *dentry) +{ + return time_before64(fuse_dentry_time(dentry), get_jiffies_64()); +} + const struct dentry_operations fuse_dentry_operations = { .d_revalidate = fuse_dentry_revalidate, + .d_delete = fuse_dentry_delete, .d_init = fuse_dentry_init, .d_release = fuse_dentry_release, }; From patchwork Mon Dec 10 17:12:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721941 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4A3C15A6 for ; Mon, 10 Dec 2018 17:19:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A86C02A9BA for ; Mon, 10 Dec 2018 17:19:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9C5EB2AA0C; Mon, 10 Dec 2018 17:19:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58BF42A9BA for ; Mon, 10 Dec 2018 17:19:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727868AbeLJRTM (ORCPT ); Mon, 10 Dec 2018 12:19:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59292 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728384AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5217C2BEAE; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 04CFE600D6; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 814A722426A; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 31/52] dax: Pass dax_dev to dax_writeback_mapping_range() Date: Mon, 10 Dec 2018 12:12:57 -0500 Message-Id: <20181210171318.16998-32-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Right now dax_writeback_mapping_range() is passed a bdev and dax_dev is searched from that bdev name. virtio-fs does not have a bdev. So pass in dax_dev also to dax_writeback_mapping_range(). If dax_dev is passed in, bdev is not used otherwise dax_dev is searched using bdev. Signed-off-by: Vivek Goyal --- fs/dax.c | 16 ++++++++++------ fs/ext4/inode.c | 2 +- fs/xfs/xfs_aops.c | 2 +- include/linux/dax.h | 6 ++++-- 4 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 6431c3aba182..1ae3a60c17d4 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -911,12 +911,12 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, * on persistent storage prior to completion of the operation. */ int dax_writeback_mapping_range(struct address_space *mapping, - struct block_device *bdev, struct writeback_control *wbc) + struct block_device *bdev, struct dax_device *dax_dev, + struct writeback_control *wbc) { XA_STATE(xas, &mapping->i_pages, wbc->range_start >> PAGE_SHIFT); struct inode *inode = mapping->host; pgoff_t end_index = wbc->range_end >> PAGE_SHIFT; - struct dax_device *dax_dev; void *entry; int ret = 0; unsigned int scanned = 0; @@ -927,9 +927,12 @@ int dax_writeback_mapping_range(struct address_space *mapping, if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL) return 0; - dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); - if (!dax_dev) - return -EIO; + if (bdev) { + WARN_ON(dax_dev); + dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); + if (!dax_dev) + return -EIO; + } trace_dax_writeback_range(inode, xas.xa_index, end_index); @@ -951,7 +954,8 @@ int dax_writeback_mapping_range(struct address_space *mapping, xas_lock_irq(&xas); } xas_unlock_irq(&xas); - put_dax(dax_dev); + if (bdev) + put_dax(dax_dev); trace_dax_writeback_range_done(inode, xas.xa_index, end_index); return ret; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 22a9d8159720..3569c260a3bd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2978,7 +2978,7 @@ static int ext4_dax_writepages(struct address_space *mapping, percpu_down_read(&sbi->s_journal_flag_rwsem); trace_ext4_writepages(inode, wbc); - ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, wbc); + ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, NULL, wbc); trace_ext4_writepages_result(inode, wbc, ret, nr_to_write - wbc->nr_to_write); percpu_up_read(&sbi->s_journal_flag_rwsem); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 338b9d9984e0..b1947beec50a 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -951,7 +951,7 @@ xfs_dax_writepages( { xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); return dax_writeback_mapping_range(mapping, - xfs_find_bdev_for_inode(mapping->host), wbc); + xfs_find_bdev_for_inode(mapping->host), NULL, wbc); } STATIC int diff --git a/include/linux/dax.h b/include/linux/dax.h index 450b28db9533..a8461841f148 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -85,7 +85,8 @@ static inline void fs_put_dax(struct dax_device *dax_dev) struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev); int dax_writeback_mapping_range(struct address_space *mapping, - struct block_device *bdev, struct writeback_control *wbc); + struct block_device *bdev, struct dax_device *dax_dev, + struct writeback_control *wbc); struct page *dax_layout_busy_page(struct address_space *mapping); bool dax_lock_mapping_entry(struct page *page); @@ -117,7 +118,8 @@ static inline struct page *dax_layout_busy_page(struct address_space *mapping) } static inline int dax_writeback_mapping_range(struct address_space *mapping, - struct block_device *bdev, struct writeback_control *wbc) + struct block_device *bdev, struct dax_device *dax_dev, + struct writeback_control *wbc) { return -EOPNOTSUPP; } From patchwork Mon Dec 10 17:12:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 80F6A14E2 for ; Mon, 10 Dec 2018 17:14:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 642732AF02 for ; Mon, 10 Dec 2018 17:14:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 525D32AF37; Mon, 10 Dec 2018 17:14:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED2CE2AF02 for ; Mon, 10 Dec 2018 17:14:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728569AbeLJRNo (ORCPT ); Mon, 10 Dec 2018 12:13:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40834 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728544AbeLJRNn (ORCPT ); Mon, 10 Dec 2018 12:13:43 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 269CE3C2CCC; Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0246A5C232; Mon, 10 Dec 2018 17:13:34 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 8570722426B; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 32/52] fuse: Define dax address space operations Date: Mon, 10 Dec 2018 12:12:58 -0500 Message-Id: <20181210171318.16998-33-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is done along the lines of ext4 and xfs. I primarily wanted ->writepages hook at this time so that I could call into dax_writeback_mapping_range(). This in turn will decide which pfns need to be written back and call dax_flush() on those. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 5230f2d84a14..eb12776f5ff6 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2306,6 +2306,17 @@ static int fuse_writepages_fill(struct page *page, return err; } +static int fuse_dax_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + + struct inode *inode = mapping->host; + struct fuse_conn *fc = get_fuse_conn(inode); + + return dax_writeback_mapping_range(mapping, + NULL, fc->dax_dev, wbc); +} + static int fuse_writepages(struct address_space *mapping, struct writeback_control *wbc) { @@ -3646,6 +3657,13 @@ static const struct address_space_operations fuse_file_aops = { .write_end = fuse_write_end, }; +static const struct address_space_operations fuse_dax_file_aops = { + .writepages = fuse_dax_writepages, + .direct_IO = noop_direct_IO, + .set_page_dirty = noop_set_page_dirty, + .invalidatepage = noop_invalidatepage, +}; + void fuse_init_file_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -3664,5 +3682,6 @@ void fuse_init_file_inode(struct inode *inode) if (fc->dax_dev) { inode->i_flags |= S_DAX; inode->i_fop = &fuse_dax_file_operations; + inode->i_data.a_ops = &fuse_dax_file_aops; } } From patchwork Mon Dec 10 17:12:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721815 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C673914E2 for ; Mon, 10 Dec 2018 17:15:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC9E72AA59 for ; Mon, 10 Dec 2018 17:15:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A13452AF37; Mon, 10 Dec 2018 17:15:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3AF432AA59 for ; Mon, 10 Dec 2018 17:15:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728571AbeLJRPK (ORCPT ); Mon, 10 Dec 2018 12:15:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58538 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728520AbeLJRNm (ORCPT ); Mon, 10 Dec 2018 12:13:42 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A4B7B3001FCD; Mon, 10 Dec 2018 17:13:41 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id B87BF608E6; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 88A6122426C; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 33/52] fuse, dax: Take ->i_mmap_sem lock during dax page fault Date: Mon, 10 Dec 2018 12:12:59 -0500 Message-Id: <20181210171318.16998-34-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 10 Dec 2018 17:13:41 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We need some kind of locking mechanism here. Normal file systems like ext4 and xfs seems to take their own semaphore to protect agains truncate while fault is going on. We have additional requirement to protect against fuse dax memory range reclaim. When a range has been selected for reclaim, we need to make sure no other read/write/fault can try to access that memory range while reclaim is in progress. Once reclaim is complete, lock will be released and read/write/fault will trigger allocation of fresh dax range. Taking inode_lock() is not an option in fault path as lockdep complains about circular dependencies. So define a new fuse_inode->i_mmap_sem. Signed-off-by: Vivek Goyal --- fs/fuse/dir.c | 2 ++ fs/fuse/file.c | 17 +++++++++++++---- fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index b7e6e421f6bb..8aa4ff82ea7a 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1553,8 +1553,10 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, */ if ((is_truncate || !is_wb) && S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { + down_write(&fi->i_mmap_sem); truncate_pagecache(inode, outarg.attr.size); invalidate_inode_pages2(inode->i_mapping); + up_write(&fi->i_mmap_sem); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index eb12776f5ff6..73068289f62e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2523,13 +2523,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, if (write) sb_start_pagefault(sb); - /* TODO inode semaphore to protect faults vs truncate */ - + /* + * We need to serialize against not only truncate but also against + * fuse dax memory range reclaim. While a range is being reclaimed, + * we do not want any read/write/mmap to make progress and try + * to populate page cache or access memory we are trying to free. + */ + down_read(&get_fuse_inode(inode)->i_mmap_sem); ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); + up_read(&get_fuse_inode(inode)->i_mmap_sem); + if (write) sb_end_pagefault(sb); @@ -3476,9 +3483,11 @@ static long __fuse_file_fallocate(struct file *file, int mode, file_update_time(file); } - if (mode & FALLOC_FL_PUNCH_HOLE) + if (mode & FALLOC_FL_PUNCH_HOLE) { + down_write(&fi->i_mmap_sem); truncate_pagecache_range(inode, offset, offset + length - 1); - + down_write(&fi->i_mmap_sem); + } fuse_invalidate_attr(inode); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e32b0059493b..280f717deb57 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -211,6 +211,13 @@ struct fuse_inode { */ struct rw_semaphore i_dmap_sem; + /** + * Can't take inode lock in fault path (leads to circular dependency). + * So take this in fuse dax fault path to make sure truncate and + * punch hole etc. can't make progress in parallel. + */ + struct rw_semaphore i_mmap_sem; + /** Sorted rb tree of struct fuse_dax_mapping elements */ struct rb_root_cached dmap_tree; unsigned long nr_dmaps; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 234b9c0c80ab..59fc5a7a18fc 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -85,6 +85,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->state = 0; fi->nr_dmaps = 0; mutex_init(&fi->mutex); + init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_dmap_sem); fi->forget = fuse_alloc_forget(); if (!fi->forget) { From patchwork Mon Dec 10 17:13:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721799 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EFA1814E2 for ; Mon, 10 Dec 2018 17:14:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D5A8C2AF02 for ; Mon, 10 Dec 2018 17:14:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C97FF2AF37; Mon, 10 Dec 2018 17:14:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32ACA2AF02 for ; Mon, 10 Dec 2018 17:14:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728583AbeLJRNo (ORCPT ); Mon, 10 Dec 2018 12:13:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58572 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728551AbeLJRNo (ORCPT ); Mon, 10 Dec 2018 12:13:44 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 350383001FD2; Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 088775C238; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 8CD3122426D; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 34/52] fuse: Add logic to free up a memory range Date: Mon, 10 Dec 2018 12:13:00 -0500 Message-Id: <20181210171318.16998-35-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 10 Dec 2018 17:13:43 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add logic to free up a busy memory range. Freed memory range will be returned to free pool. Add a worker which can be started to select and free some busy memory ranges. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- fs/fuse/fuse_i.h | 10 ++++ fs/fuse/inode.c | 2 + 3 files changed, 159 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 73068289f62e..17becdff3014 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -272,7 +272,15 @@ static int fuse_setup_one_mapping(struct inode *inode, pr_debug("fuse_setup_one_mapping() succeeded. offset=0x%llx err=%zd\n", offset, err); - /* TODO: What locking is required here. For now, using fc->lock */ + /* + * We don't take a refernce on inode. inode is valid right now and + * when inode is going away, cleanup logic should first cleanup + * dmap entries. + * + * TODO: Do we need to ensure that we are holding inode lock + * as well. + */ + dmap->inode = inode; dmap->start = offset; dmap->end = offset + FUSE_DAX_MEM_RANGE_SZ - 1; /* Protected by fi->i_dmap_sem */ @@ -347,6 +355,8 @@ void fuse_removemapping(struct inode *inode) continue; } + dmap->inode = NULL; + /* Add it back to free ranges list */ free_dax_mapping(fc, dmap); } @@ -3694,3 +3704,139 @@ void fuse_init_file_inode(struct inode *inode) inode->i_data.a_ops = &fuse_dax_file_aops; } } + +int fuse_dax_free_one_mapping_locked(struct fuse_conn *fc, struct inode *inode, + u64 dmap_start) +{ + int ret; + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_dax_mapping *dmap; + + WARN_ON(!inode_is_locked(inode)); + + /* Find fuse dax mapping at file offset inode. */ + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, dmap_start, + dmap_start); + + /* Range already got cleaned up by somebody else */ + if (!dmap) + return 0; + + ret = filemap_fdatawrite_range(inode->i_mapping, dmap->start, dmap->end); + if (ret) { + printk("filemap_fdatawrite_range() failed. err=%d start=0x%llx," + " end=0x%llx\n", ret, dmap->start, dmap->end); + return ret; + } + + ret = invalidate_inode_pages2_range(inode->i_mapping, + dmap->start >> PAGE_SHIFT, + dmap->end >> PAGE_SHIFT); + /* TODO: What to do if above fails? For now, + * leave the range in place. + */ + if (ret) { + printk("invalidate_inode_pages2_range() failed err=%d\n", ret); + return ret; + } + + /* Remove dax mapping from inode interval tree now */ + fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); + fi->nr_dmaps--; + + /* Cleanup dmap entry and add back to free list */ + spin_lock(&fc->lock); + list_del_init(&dmap->busy_list); + WARN_ON(fc->nr_busy_ranges == 0); + fc->nr_busy_ranges--; + dmap->inode = NULL; + dmap->start = dmap->end = 0; + __free_dax_mapping(fc, dmap); + spin_unlock(&fc->lock); + + pr_debug("fuse: freed memory range window_offset=0x%llx," + " length=0x%llx\n", dmap->window_offset, + dmap->length); + + return ret; +} + +/* + * Free a range of memory. + * Locking. + * 1. Take inode->i_rwsem to prever further read/write. + * 2. Take fuse_inode->i_mmap_sem to block dax faults. + * 3. Take fuse_inode->i_dmap_sem to protect interval tree. It might not + * be strictly necessary as lock 1 and 2 seem sufficient. + */ +int fuse_dax_free_one_mapping(struct fuse_conn *fc, struct inode *inode, + u64 dmap_start) +{ + int ret; + struct fuse_inode *fi = get_fuse_inode(inode); + + inode_lock(inode); + down_write(&fi->i_mmap_sem); + down_write(&fi->i_dmap_sem); + ret = fuse_dax_free_one_mapping_locked(fc, inode, dmap_start); + up_write(&fi->i_dmap_sem); + up_write(&fi->i_mmap_sem); + inode_unlock(inode); + return ret; +} + +int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) +{ + struct fuse_dax_mapping *dmap, *pos; + int ret, i; + u64 dmap_start = 0, window_offset = 0; + struct inode *inode = NULL; + + /* Pick first busy range and free it for now*/ + for (i = 0; i < nr_to_free; i++) { + dmap = NULL; + spin_lock(&fc->lock); + + list_for_each_entry(pos, &fc->busy_ranges, busy_list) { + dmap = pos; + inode = igrab(dmap->inode); + /* + * This inode is going away. That will free + * up all the ranges anyway, continue to + * next range. + */ + if (!inode) + continue; + dmap_start = dmap->start; + window_offset = dmap->window_offset; + break; + } + spin_unlock(&fc->lock); + if (!dmap) + return 0; + + ret = fuse_dax_free_one_mapping(fc, inode, dmap_start); + iput(inode); + if (ret) { + printk("%s(window_offset=0x%llx) failed. err=%d\n", + __func__, window_offset, ret); + return ret; + } + } + return 0; +} + +/* TODO: This probably should go in inode.c */ +void fuse_dax_free_mem_worker(struct work_struct *work) +{ + int ret; + struct fuse_conn *fc = container_of(work, struct fuse_conn, + dax_free_work.work); + pr_debug("fuse: Worker to free memory called.\n"); + pr_debug("fuse: Worker to free memory called. nr_free_ranges=%lu" + " nr_busy_ranges=%lu\n", fc->nr_free_ranges, + fc->nr_busy_ranges); + ret = fuse_dax_free_memory(fc, FUSE_DAX_RECLAIM_CHUNK); + if (ret) + pr_debug("fuse: fuse_dax_free_memory() failed with err=%d\n", ret); +} diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 280f717deb57..383deaf0ecf1 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -50,6 +50,9 @@ #define FUSE_DAX_MEM_RANGE_SZ (2*1024*1024) #define FUSE_DAX_MEM_RANGE_PAGES (FUSE_DAX_MEM_RANGE_SZ/PAGE_SIZE) +/* Number of ranges reclaimer will try to free in one invocation */ +#define FUSE_DAX_RECLAIM_CHUNK (10) + /** List of active connections */ extern struct list_head fuse_conn_list; @@ -102,6 +105,9 @@ struct fuse_forget_link { /** Translation information for file offsets to DAX window offsets */ struct fuse_dax_mapping { + /* Pointer to inode where this memory range is mapped */ + struct inode *inode; + /* Will connect in fc->free_ranges to keep track of free memory */ struct list_head list; @@ -870,6 +876,9 @@ struct fuse_conn { unsigned long nr_busy_ranges; struct list_head busy_ranges; + /* Worker to free up memory ranges */ + struct delayed_work dax_free_work; + /* * DAX Window Free Ranges. TODO: This might not be best place to store * this free list @@ -1244,6 +1253,7 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); * Get the next unique ID for a request */ u64 fuse_get_unique(struct fuse_iqueue *fiq); +void fuse_dax_free_mem_worker(struct work_struct *work); void fuse_removemapping(struct inode *inode); #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 59fc5a7a18fc..44f7bc44e319 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -713,6 +713,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->user_ns = get_user_ns(user_ns); INIT_LIST_HEAD(&fc->free_ranges); INIT_LIST_HEAD(&fc->busy_ranges); + INIT_DELAYED_WORK(&fc->dax_free_work, fuse_dax_free_mem_worker); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -721,6 +722,7 @@ void fuse_conn_put(struct fuse_conn *fc) if (refcount_dec_and_test(&fc->count)) { if (fc->destroy_req) fuse_request_free(fc->destroy_req); + flush_delayed_work(&fc->dax_free_work); if (fc->dax_dev) fuse_free_dax_mem_ranges(&fc->free_ranges); put_pid_ns(fc->pid_ns); From patchwork Mon Dec 10 17:13:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB68115A6 for ; Mon, 10 Dec 2018 17:17:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE9342ABC3 for ; Mon, 10 Dec 2018 17:17:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A2E652AF3C; Mon, 10 Dec 2018 17:17:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CEED2ABC3 for ; Mon, 10 Dec 2018 17:17:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728391AbeLJRRd (ORCPT ); Mon, 10 Dec 2018 12:17:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58430 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728383AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4EC0730044D9; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0C3F26015E; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 8FFCE22426F; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 35/52] fuse: Add logic to do direct reclaim of memory Date: Mon, 10 Dec 2018 12:13:01 -0500 Message-Id: <20181210171318.16998-36-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This can be done only from same inode. Also it can be done only for read/write case and not for fault case. Reason, as of now reclaim requires holding inode_lock, fuse_inode->i_mmap_sem and fuse_inode->dmap_tree locks in that order and only read/write path will allow that (and not fault path). Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 105 insertions(+), 16 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 17becdff3014..13db83d105ff 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -30,6 +30,8 @@ INTERVAL_TREE_DEFINE(struct fuse_dax_mapping, static long __fuse_file_fallocate(struct file *file, int mode, loff_t offset, loff_t length); +static struct fuse_dax_mapping *alloc_dax_mapping_reclaim(struct fuse_conn *fc, + struct inode *inode); static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file, int opcode, struct fuse_open_out *outargp) @@ -1727,7 +1729,12 @@ static int fuse_iomap_begin(struct inode *inode, loff_t pos, loff_t length, if (pos >= i_size_read(inode)) goto iomap_hole; - alloc_dmap = alloc_dax_mapping(fc); + /* Can't do reclaim in fault path yet due to lock ordering */ + if (flags & IOMAP_FAULT) + alloc_dmap = alloc_dax_mapping(fc); + else + alloc_dmap = alloc_dax_mapping_reclaim(fc, inode); + if (!alloc_dmap) return -EBUSY; @@ -3705,24 +3712,14 @@ void fuse_init_file_inode(struct inode *inode) } } -int fuse_dax_free_one_mapping_locked(struct fuse_conn *fc, struct inode *inode, - u64 dmap_start) +int fuse_dax_reclaim_dmap_locked(struct fuse_conn *fc, struct inode *inode, + struct fuse_dax_mapping *dmap) { int ret; struct fuse_inode *fi = get_fuse_inode(inode); - struct fuse_dax_mapping *dmap; - - WARN_ON(!inode_is_locked(inode)); - - /* Find fuse dax mapping at file offset inode. */ - dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, dmap_start, - dmap_start); - - /* Range already got cleaned up by somebody else */ - if (!dmap) - return 0; - ret = filemap_fdatawrite_range(inode->i_mapping, dmap->start, dmap->end); + ret = filemap_fdatawrite_range(inode->i_mapping, dmap->start, + dmap->end); if (ret) { printk("filemap_fdatawrite_range() failed. err=%d start=0x%llx," " end=0x%llx\n", ret, dmap->start, dmap->end); @@ -3743,6 +3740,99 @@ int fuse_dax_free_one_mapping_locked(struct fuse_conn *fc, struct inode *inode, /* Remove dax mapping from inode interval tree now */ fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); fi->nr_dmaps--; + return 0; +} + +/* First first mapping in the tree and free it. */ +struct fuse_dax_mapping *fuse_dax_reclaim_first_mapping_locked( + struct fuse_conn *fc, struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_dax_mapping *dmap; + int ret; + + /* Find fuse dax mapping at file offset inode. */ + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, 0, -1); + if (!dmap) + return NULL; + + ret = fuse_dax_reclaim_dmap_locked(fc, inode, dmap); + if (ret < 0) + return ERR_PTR(ret); + + /* Clean up dmap. Do not add back to free list */ + spin_lock(&fc->lock); + list_del_init(&dmap->busy_list); + WARN_ON(fc->nr_busy_ranges == 0); + fc->nr_busy_ranges--; + dmap->inode = NULL; + dmap->start = dmap->end = 0; + spin_unlock(&fc->lock); + + pr_debug("fuse: reclaimed memory range window_offset=0x%llx," + " length=0x%llx\n", dmap->window_offset, + dmap->length); + return dmap; +} + +/* + * First first mapping in the tree and free it and return it. Do not add + * it back to free pool. + * + * This is called with inode lock held. + */ +struct fuse_dax_mapping *fuse_dax_reclaim_first_mapping(struct fuse_conn *fc, + struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_dax_mapping *dmap; + + down_write(&fi->i_mmap_sem); + down_write(&fi->i_dmap_sem); + dmap = fuse_dax_reclaim_first_mapping_locked(fc, inode); + up_write(&fi->i_dmap_sem); + up_write(&fi->i_mmap_sem); + return dmap; +} + +static struct fuse_dax_mapping *alloc_dax_mapping_reclaim(struct fuse_conn *fc, + struct inode *inode) +{ + struct fuse_dax_mapping *dmap; + struct fuse_inode *fi = get_fuse_inode(inode); + + dmap = alloc_dax_mapping(fc); + if (dmap) + return dmap; + + /* There are no mappings which can be reclaimed */ + if (!fi->nr_dmaps) + return NULL; + + /* Try reclaim a fuse dax memory range */ + return fuse_dax_reclaim_first_mapping(fc, inode); +} + +int fuse_dax_free_one_mapping_locked(struct fuse_conn *fc, struct inode *inode, + u64 dmap_start) +{ + int ret; + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_dax_mapping *dmap; + + WARN_ON(!inode_is_locked(inode)); + + /* Find fuse dax mapping at file offset inode. */ + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, dmap_start, + dmap_start); + + /* Range already got cleaned up by somebody else */ + if (!dmap) + return 0; + + ret = fuse_dax_reclaim_dmap_locked(fc, inode, dmap); + if (ret < 0) + return ret; /* Cleanup dmap entry and add back to free list */ spin_lock(&fc->lock); @@ -3757,7 +3847,6 @@ int fuse_dax_free_one_mapping_locked(struct fuse_conn *fc, struct inode *inode, pr_debug("fuse: freed memory range window_offset=0x%llx," " length=0x%llx\n", dmap->window_offset, dmap->length); - return ret; } From patchwork Mon Dec 10 17:13:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721919 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA74815A6 for ; Mon, 10 Dec 2018 17:18:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E5C42AF3D for ; Mon, 10 Dec 2018 17:18:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82A602AF42; Mon, 10 Dec 2018 17:18:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 29AE52AF3D for ; Mon, 10 Dec 2018 17:18:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727564AbeLJRRu (ORCPT ); Mon, 10 Dec 2018 12:17:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38438 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728387AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 716E1C049587; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 27A57600D7; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9655F224270; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 36/52] fuse: Kick worker when free memory drops below 20% of total ranges Date: Mon, 10 Dec 2018 12:13:02 -0500 Message-Id: <20181210171318.16998-37-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Kick worker to free up some memory when number of free ranges drops below 20% of total free ranges at the time of initialization. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 11 ++++++++++- fs/fuse/fuse_i.h | 9 +++++++++ fs/fuse/inode.c | 1 + 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 13db83d105ff..1f172d372eeb 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -186,6 +186,7 @@ static void fuse_link_write_file(struct file *file) static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) { + unsigned long free_threshold; struct fuse_dax_mapping *dmap = NULL; spin_lock(&fc->lock); @@ -193,7 +194,7 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) /* TODO: Add logic to try to free up memory if wait is allowed */ if (fc->nr_free_ranges <= 0) { spin_unlock(&fc->lock); - return NULL; + goto out_kick; } WARN_ON(list_empty(&fc->free_ranges)); @@ -204,6 +205,14 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) list_del_init(&dmap->list); fc->nr_free_ranges--; spin_unlock(&fc->lock); + +out_kick: + /* If number of free ranges are below threshold, start reclaim */ + free_threshold = (fc->nr_ranges * FUSE_DAX_RECLAIM_THRESHOLD)/100; + if (free_threshold > 0 && fc->nr_free_ranges < free_threshold) { + pr_debug("fuse: Kicking dax memory reclaim worker. nr_free_ranges=0x%ld nr_total_ranges=%ld\n", fc->nr_free_ranges, fc->nr_ranges); + queue_delayed_work(system_long_wq, &fc->dax_free_work, 0); + } return dmap; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 383deaf0ecf1..bbefa7c11078 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -53,6 +53,13 @@ /* Number of ranges reclaimer will try to free in one invocation */ #define FUSE_DAX_RECLAIM_CHUNK (10) +/* + * Dax memory reclaim threshold in percetage of total ranges. When free + * number of free ranges drops below this threshold, reclaim can trigger + * Default is 20% + * */ +#define FUSE_DAX_RECLAIM_THRESHOLD (20) + /** List of active connections */ extern struct list_head fuse_conn_list; @@ -885,6 +892,8 @@ struct fuse_conn { */ unsigned long nr_free_ranges; struct list_head free_ranges; + + unsigned long nr_ranges; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 44f7bc44e319..d31acb97eede 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -675,6 +675,7 @@ static int fuse_dax_mem_range_init(struct fuse_conn *fc, list_replace_init(&mem_ranges, &fc->free_ranges); fc->nr_free_ranges = allocated_ranges; + fc->nr_ranges = allocated_ranges; return 0; out_err: /* Free All allocated elements */ From patchwork Mon Dec 10 17:13:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721869 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8D0015A6 for ; Mon, 10 Dec 2018 17:17:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9BA32AF37 for ; Mon, 10 Dec 2018 17:17:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B7FA12AF45; Mon, 10 Dec 2018 17:17:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6F052AF37 for ; Mon, 10 Dec 2018 17:16:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728888AbeLJRQk (ORCPT ); Mon, 10 Dec 2018 12:16:40 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38444 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728390AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 91901C049588; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 39EA5605CC; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9803B224271; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 37/52] fuse: multiplex cached/direct_io/dax file operations Date: Mon, 10 Dec 2018 12:13:03 -0500 Message-Id: <20181210171318.16998-38-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi --- fs/fuse/file.c | 91 ++++++++++++++++++++++++++++-------------------------- fs/splice.c | 3 +- include/linux/fs.h | 2 ++ 3 files changed, 52 insertions(+), 44 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1f172d372eeb..6421c94cef46 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -22,8 +22,6 @@ #include #include -static const struct file_operations fuse_direct_io_file_operations; - INTERVAL_TREE_DEFINE(struct fuse_dax_mapping, rb, __u64, __subtree_last, START, LAST, static inline, fuse_dax_interval_tree); @@ -381,8 +379,6 @@ void fuse_finish_open(struct inode *inode, struct file *file) struct fuse_file *ff = file->private_data; struct fuse_conn *fc = get_fuse_conn(inode); - if (ff->open_flags & FOPEN_DIRECT_IO) - file->f_op = &fuse_direct_io_file_operations; if (!(ff->open_flags & FOPEN_KEEP_CACHE)) invalidate_inode_pages2(inode->i_mapping); if (ff->open_flags & FOPEN_NONSEEKABLE) @@ -1121,11 +1117,23 @@ static int fuse_readpages(struct file *file, struct address_space *mapping, return err; } + +static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to); +static ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to); + static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { - struct inode *inode = iocb->ki_filp->f_mapping->host; + struct file *file = iocb->ki_filp; + struct fuse_file *ff = file->private_data; + struct inode *inode = file->f_mapping->host; struct fuse_conn *fc = get_fuse_conn(inode); + if (ff->open_flags & FOPEN_DIRECT_IO) + return fuse_direct_read_iter(iocb, to); + + if (IS_DAX(inode)) + return fuse_dax_read_iter(iocb, to); + /* * In auto invalidate mode, always update attributes on read. * Otherwise, only update if we attempt to read past EOF (to ensure @@ -1375,9 +1383,14 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, return res > 0 ? res : err; } +static ssize_t fuse_direct_write_iter(struct kiocb *iocb, + struct iov_iter *from); +static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from); + static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; + struct fuse_file *ff = file->private_data; struct address_space *mapping = file->f_mapping; ssize_t written = 0; ssize_t written_buffered = 0; @@ -1385,6 +1398,11 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ssize_t err; loff_t endbyte = 0; + if (ff->open_flags & FOPEN_DIRECT_IO) + return fuse_direct_write_iter(iocb, from); + if (IS_DAX(inode)) + return fuse_dax_write_iter(iocb, from); + if (get_fuse_conn(inode)->writeback_cache) { /* Update size (EOF optimization) and mode (SUID clearing) */ err = fuse_update_attributes(mapping->host, file); @@ -2517,8 +2535,20 @@ static const struct vm_operations_struct fuse_file_vm_ops = { .page_mkwrite = fuse_page_mkwrite, }; +static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma); +static int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma); + static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) { + struct fuse_file *ff = file->private_data; + + /* DAX mmap is superior to direct_io mmap */ + if (IS_DAX(file_inode(file))) + return fuse_dax_mmap(file, vma); + + if (ff->open_flags & FOPEN_DIRECT_IO) + return fuse_direct_mmap(file, vma); + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE)) fuse_link_write_file(file); @@ -2538,6 +2568,18 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma) return generic_file_mmap(file, vma); } +static ssize_t fuse_file_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + struct fuse_file *ff = in->private_data; + + if (ff->open_flags & FOPEN_DIRECT_IO) + return default_file_splice_read(in, ppos, pipe, len, flags); + else + return generic_file_splice_read(in, ppos, pipe, len, flags); + +} static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, bool write) { @@ -3629,13 +3671,13 @@ static const struct file_operations fuse_file_operations = { .read_iter = fuse_file_read_iter, .write_iter = fuse_file_write_iter, .mmap = fuse_file_mmap, + .splice_read = fuse_file_splice_read, .open = fuse_open, .flush = fuse_flush, .release = fuse_release, .fsync = fuse_fsync, .lock = fuse_file_lock, .flock = fuse_file_flock, - .splice_read = generic_file_splice_read, .unlocked_ioctl = fuse_file_ioctl, .compat_ioctl = fuse_file_compat_ioctl, .poll = fuse_file_poll, @@ -3643,42 +3685,6 @@ static const struct file_operations fuse_file_operations = { .copy_file_range = fuse_copy_file_range, }; -static const struct file_operations fuse_direct_io_file_operations = { - .llseek = fuse_file_llseek, - .read_iter = fuse_direct_read_iter, - .write_iter = fuse_direct_write_iter, - .mmap = fuse_direct_mmap, - .open = fuse_open, - .flush = fuse_flush, - .release = fuse_release, - .fsync = fuse_fsync, - .lock = fuse_file_lock, - .flock = fuse_file_flock, - .unlocked_ioctl = fuse_file_ioctl, - .compat_ioctl = fuse_file_compat_ioctl, - .poll = fuse_file_poll, - .fallocate = fuse_file_fallocate, - /* no splice_read */ -}; - -static const struct file_operations fuse_dax_file_operations = { - .llseek = fuse_file_llseek, - .read_iter = fuse_dax_read_iter, - .write_iter = fuse_dax_write_iter, - .mmap = fuse_dax_mmap, - .open = fuse_open, - .flush = fuse_flush, - .release = fuse_release, - .fsync = fuse_fsync, - .lock = fuse_file_lock, - .flock = fuse_file_flock, - .unlocked_ioctl = fuse_file_ioctl, - .compat_ioctl = fuse_file_compat_ioctl, - .poll = fuse_file_poll, - .fallocate = fuse_file_fallocate, - /* no splice_read */ -}; - static const struct address_space_operations fuse_file_aops = { .readpage = fuse_readpage, .writepage = fuse_writepage, @@ -3716,7 +3722,6 @@ void fuse_init_file_inode(struct inode *inode) if (fc->dax_dev) { inode->i_flags |= S_DAX; - inode->i_fop = &fuse_dax_file_operations; inode->i_data.a_ops = &fuse_dax_file_aops; } } diff --git a/fs/splice.c b/fs/splice.c index 3553f1956508..93cbb03a70b1 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -365,7 +365,7 @@ static ssize_t kernel_readv(struct file *file, const struct kvec *vec, return res; } -static ssize_t default_file_splice_read(struct file *in, loff_t *ppos, +ssize_t default_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { @@ -429,6 +429,7 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos, iov_iter_advance(&to, copied); /* truncates and discards */ return res; } +EXPORT_SYMBOL(default_file_splice_read); /* * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos' diff --git a/include/linux/fs.h b/include/linux/fs.h index c95c0807471f..574e63b58a6f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3040,6 +3040,8 @@ extern void block_sync_page(struct page *page); /* fs/splice.c */ extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); +extern ssize_t default_file_splice_read(struct file *, loff_t *, + struct pipe_inode_info *, size_t, unsigned int); extern ssize_t iter_file_splice_write(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, From patchwork Mon Dec 10 17:13:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721889 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A7D0714E2 for ; Mon, 10 Dec 2018 17:17:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8BACE2ABC3 for ; Mon, 10 Dec 2018 17:17:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F8862AF3C; Mon, 10 Dec 2018 17:17:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 073F62ABC3 for ; Mon, 10 Dec 2018 17:17:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728928AbeLJRRL (ORCPT ); Mon, 10 Dec 2018 12:17:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52798 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728391AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A0C7B3084042; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 52DE960634; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9C1C1224272; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 38/52] Dispatch FORGET requests later instead of dropping them Date: Mon, 10 Dec 2018 12:13:04 -0500 Message-Id: <20181210171318.16998-39-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If virtio queue is full, then don't drop FORGET requests. Instead, wait a bit and try to dispatch these little later using a worker thread. Signed-off-by: Vivek Goyal --- fs/fuse/virtio_fs.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 78 insertions(+), 8 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 98dba3cf9d40..f436f5b3f85c 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -22,6 +22,8 @@ static LIST_HEAD(virtio_fs_instances); struct virtio_fs_vq { struct virtqueue *vq; /* protected by fpq->lock */ struct work_struct done_work; + struct list_head queued_reqs; + struct delayed_work dispatch_work; struct fuse_dev *fud; char name[24]; } ____cacheline_aligned_in_smp; @@ -53,6 +55,13 @@ struct virtio_fs { size_t window_len; }; +struct virtio_fs_forget { + struct fuse_in_header ih; + struct fuse_forget_in arg; + /* This request can be temporarily queued on virt queue */ + struct list_head list; +}; + /* TODO: This should be in a PCI file somewhere */ static int virtio_pci_find_shm_cap(struct pci_dev *dev, u8 required_id, @@ -189,6 +198,7 @@ static void virtio_fs_free_devs(struct virtio_fs *fs) continue; flush_work(&fsvq->done_work); + flush_delayed_work(&fsvq->dispatch_work); fuse_dev_free(fsvq->fud); /* TODO need to quiesce/end_requests/decrement dev_count */ fsvq->fud = NULL; @@ -252,6 +262,58 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work) spin_unlock(&fpq->lock); } +static void virtio_fs_dummy_dispatch_work(struct work_struct *work) +{ + return; +} + +static void virtio_fs_hiprio_dispatch_work(struct work_struct *work) +{ + struct virtio_fs_forget *forget; + struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, + dispatch_work.work); + struct fuse_pqueue *fpq = &fsvq->fud->pq; + struct virtqueue *vq = fsvq->vq; + struct scatterlist sg; + struct scatterlist *sgs[] = {&sg}; + bool notify; + int ret; + + pr_debug("worker virtio_fs_hiprio_dispatch_work() called.\n"); + while(1) { + spin_lock(&fpq->lock); + forget = list_first_entry_or_null(&fsvq->queued_reqs, + struct virtio_fs_forget, list); + if (!forget) { + spin_unlock(&fpq->lock); + return; + } + + list_del(&forget->list); + sg_init_one(&sg, forget, sizeof(*forget)); + + /* Enqueue the request */ + dev_dbg(&vq->vdev->dev, "%s\n", __func__); + ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC); + if (ret < 0) { + pr_debug("virtio-fs: Could not queue FORGET: queue full. Will try later\n"); + list_add_tail(&forget->list, &fsvq->queued_reqs); + schedule_delayed_work(&fsvq->dispatch_work, + msecs_to_jiffies(1)); + /* TODO handle full virtqueue */ + spin_unlock(&fpq->lock); + return; + } + + notify = virtqueue_kick_prepare(vq); + spin_unlock(&fpq->lock); + + if (notify) + virtqueue_notify(vq); + pr_debug("worker virtio_fs_hiprio_dispatch_work() dispatched one forget request.\n"); + } +} + /* Allocate and copy args into req->argbuf */ static int copy_args_to_argbuf(struct fuse_req *req) { @@ -404,15 +466,24 @@ static int virtio_fs_setup_vqs(struct virtio_device *vdev, snprintf(fs->vqs[0].name, sizeof(fs->vqs[0].name), "notifications"); INIT_WORK(&fs->vqs[0].done_work, virtio_fs_notifications_done_work); names[0] = fs->vqs[0].name; + INIT_LIST_HEAD(&fs->vqs[0].queued_reqs); + INIT_DELAYED_WORK(&fs->vqs[0].dispatch_work, + virtio_fs_dummy_dispatch_work); callbacks[1] = virtio_fs_vq_done; snprintf(fs->vqs[1].name, sizeof(fs->vqs[1].name), "hiprio"); names[1] = fs->vqs[1].name; INIT_WORK(&fs->vqs[1].done_work, virtio_fs_hiprio_done_work); + INIT_LIST_HEAD(&fs->vqs[1].queued_reqs); + INIT_DELAYED_WORK(&fs->vqs[1].dispatch_work, + virtio_fs_hiprio_dispatch_work); /* Initialize the requests virtqueues */ for (i = 2; i < fs->nvqs; i++) { INIT_WORK(&fs->vqs[i].done_work, virtio_fs_requests_done_work); + INIT_DELAYED_WORK(&fs->vqs[i].dispatch_work, + virtio_fs_dummy_dispatch_work); + INIT_LIST_HEAD(&fs->vqs[i].queued_reqs); snprintf(fs->vqs[i].name, sizeof(fs->vqs[i].name), "requests.%u", i - 2); callbacks[i] = virtio_fs_vq_done; @@ -718,11 +789,6 @@ static struct virtio_driver virtio_fs_driver = { #endif }; -struct virtio_fs_forget { - struct fuse_in_header ih; - struct fuse_forget_in arg; -}; - static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq) __releases(fiq->waitq.lock) { @@ -733,6 +799,7 @@ __releases(fiq->waitq.lock) struct scatterlist *sgs[] = {&sg}; struct virtio_fs *fs; struct virtqueue *vq; + struct virtio_fs_vq *fsvq; bool notify; u64 unique; int ret; @@ -746,7 +813,7 @@ __releases(fiq->waitq.lock) unique = fuse_get_unique(fiq); fs = fiq->priv; - + fsvq = &fs->vqs[1]; spin_unlock(&fiq->waitq.lock); /* Allocate a buffer for the request */ @@ -769,14 +836,17 @@ __releases(fiq->waitq.lock) sg_init_one(&sg, forget, sizeof(*forget)); /* Enqueue the request */ - vq = fs->vqs[1].vq; + vq = fsvq->vq; dev_dbg(&vq->vdev->dev, "%s\n", __func__); fpq = vq_to_fpq(vq); spin_lock(&fpq->lock); ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC); if (ret < 0) { - pr_err("virtio-fs: dropped FORGET: queue full\n"); + pr_debug("virtio-fs: Could not queue FORGET: queue full. Will try later\n"); + list_add_tail(&forget->list, &fsvq->queued_reqs); + schedule_delayed_work(&fsvq->dispatch_work, + msecs_to_jiffies(1)); /* TODO handle full virtqueue */ spin_unlock(&fpq->lock); goto out; From patchwork Mon Dec 10 17:13:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721871 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 508B515A6 for ; Mon, 10 Dec 2018 17:17:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3577C2AF3F for ; Mon, 10 Dec 2018 17:17:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 298E82AF42; Mon, 10 Dec 2018 17:17:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4E212AF37 for ; Mon, 10 Dec 2018 17:17:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728884AbeLJRQj (ORCPT ); Mon, 10 Dec 2018 12:16:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:1903 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728398AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A93A3307D867; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 69F6B604CE; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9FA7B224273; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 39/52] Release file in process context Date: Mon, 10 Dec 2018 12:13:05 -0500 Message-Id: <20181210171318.16998-40-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP fuse_file_put(sync) can be called with sync=true/false. If sync=true, it waits for release request response and then calls iput() in the caller's context. If sync=false, it does not wait for release request response, frees the fuse_file struct immediately and req->end function does the iput(). iput() can be a problem with DAX if called in req->end context. If this is last reference to inode (VFS has let go its reference already), then iput() will clean DAX mappings as well and send REMOVEMAPPING requests and wait for completion. (All the the worker thread context which is processing fuse replies from daemon on the host). That means it blocks worker thread and it stops processing further replies and system deadlocks. So for now, force sync release of file in case of DAX inodes. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 6421c94cef46..d86f6e5c4daf 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -451,6 +451,7 @@ void fuse_release_common(struct file *file, int opcode) { struct fuse_file *ff = file->private_data; struct fuse_req *req = ff->reserved_req; + bool sync = false; fuse_prepare_release(ff, file->f_flags, opcode); @@ -471,8 +472,20 @@ void fuse_release_common(struct file *file, int opcode) * Make the release synchronous if this is a fuseblk mount, * synchronous RELEASE is allowed (and desirable) in this case * because the server can be trusted not to screw up. + * + * For DAX, fuse server is trusted. So it should be fine to + * do a sync file put. Doing async file put is creating + * problems right now because when request finish, iput() + * can lead to freeing of inode. That means it tears down + * mappings backing DAX memory and sends REMOVEMAPPING message + * to server and blocks for completion. Currently, waiting + * in req->end context deadlocks the system as same worker thread + * can't process REMOVEMAPPING reply it is waiting for. */ - fuse_file_put(ff, ff->fc->destroy_req != NULL); + if (IS_DAX(req->misc.release.inode) || ff->fc->destroy_req != NULL) + sync = true; + + fuse_file_put(ff, sync); } static int fuse_open(struct inode *inode, struct file *file) From patchwork Mon Dec 10 17:13:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49F0115A6 for ; Mon, 10 Dec 2018 17:17:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E2D12ABC3 for ; Mon, 10 Dec 2018 17:17:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 22BBF2AF3C; Mon, 10 Dec 2018 17:17:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B71472ABC3 for ; Mon, 10 Dec 2018 17:17:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727700AbeLJRRL (ORCPT ); Mon, 10 Dec 2018 12:17:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50522 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728396AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A53CB30DDBD1; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 67FDD6015E; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id A2DF1224274; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 40/52] fuse: Do not block on inode lock while freeing memory range Date: Mon, 10 Dec 2018 12:13:06 -0500 Message-Id: <20181210171318.16998-41-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Once we select a memory range to free, we currently block on inode lock. Do not block and use trylock instead. And move on to next memory range if trylock fails. Reason being that in next few patches I want to enabling waiting for memmory ranges to become free in fuse_iomap_begin(). So insted of returning -EBUSY, a process will wait for a memory range to become free. We don't want to end up in a situation where process is sleeping in iomap_begin() with inode lock held and worker is trying to free memory from same inode, resulting in deadlock. To avoid deadlock, use trylock instead. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d86f6e5c4daf..dbe3410a94d7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3891,7 +3891,12 @@ int fuse_dax_free_one_mapping(struct fuse_conn *fc, struct inode *inode, int ret; struct fuse_inode *fi = get_fuse_inode(inode); - inode_lock(inode); + /* + * If process is blocked waiting for memory while holding inode + * lock, we will deadlock. So continue to free next range. + */ + if (!inode_trylock(inode)) + return -EAGAIN; down_write(&fi->i_mmap_sem); down_write(&fi->i_dmap_sem); ret = fuse_dax_free_one_mapping_locked(fc, inode, dmap_start); @@ -3903,19 +3908,22 @@ int fuse_dax_free_one_mapping(struct fuse_conn *fc, struct inode *inode, int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) { - struct fuse_dax_mapping *dmap, *pos; - int ret, i; + struct fuse_dax_mapping *dmap, *pos, *temp; + int ret, nr_freed = 0; u64 dmap_start = 0, window_offset = 0; struct inode *inode = NULL; /* Pick first busy range and free it for now*/ - for (i = 0; i < nr_to_free; i++) { + while(1) { + if (nr_freed >= nr_to_free) + break; + dmap = NULL; spin_lock(&fc->lock); - list_for_each_entry(pos, &fc->busy_ranges, busy_list) { - dmap = pos; - inode = igrab(dmap->inode); + list_for_each_entry_safe(pos, temp, &fc->busy_ranges, + busy_list) { + inode = igrab(pos->inode); /* * This inode is going away. That will free * up all the ranges anyway, continue to @@ -3923,6 +3931,13 @@ int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) */ if (!inode) continue; + /* + * Take this element off list and add it tail. If + * inode lock can't be obtained, this will help with + * selecting new element + */ + dmap = pos; + list_move_tail(&dmap->busy_list, &fc->busy_ranges); dmap_start = dmap->start; window_offset = dmap->window_offset; break; @@ -3933,11 +3948,16 @@ int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) ret = fuse_dax_free_one_mapping(fc, inode, dmap_start); iput(inode); - if (ret) { + if (ret && ret != -EAGAIN) { printk("%s(window_offset=0x%llx) failed. err=%d\n", __func__, window_offset, ret); return ret; } + + /* Could not get inode lock. Try next element */ + if (ret == -EAGAIN) + continue; + nr_freed++; } return 0; } From patchwork Mon Dec 10 17:13:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB39F14E2 for ; Mon, 10 Dec 2018 17:15:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 910402AF37 for ; Mon, 10 Dec 2018 17:15:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8537C2AF3F; Mon, 10 Dec 2018 17:15:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3754F2AF37 for ; Mon, 10 Dec 2018 17:15:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728809AbeLJRPh (ORCPT ); Mon, 10 Dec 2018 12:15:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55354 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728471AbeLJRNi (ORCPT ); Mon, 10 Dec 2018 12:13:38 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 64BE6804E3; Mon, 10 Dec 2018 17:13:38 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 18420101962A; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id A6FB7224275; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 41/52] fuse: Reschedule dax free work if too many EAGAIN attempts Date: Mon, 10 Dec 2018 12:13:07 -0500 Message-Id: <20181210171318.16998-42-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 10 Dec 2018 17:13:38 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP fuse_dax_free_memory() can be very cpu intensive in corner cases. For example, if one inode has consumed all the memory and a setupmapping request is pending, that means inode lock is held by request and worker thread will not get lock for a while. And given there is only one inode consuming all the dax ranges, all the attempts to acquire lock will fail. So if there are too many inode lock failures (-EAGAIN), reschedule the worker with a 10ms delay. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index dbe3410a94d7..709747458335 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3909,7 +3909,7 @@ int fuse_dax_free_one_mapping(struct fuse_conn *fc, struct inode *inode, int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) { struct fuse_dax_mapping *dmap, *pos, *temp; - int ret, nr_freed = 0; + int ret, nr_freed = 0, nr_eagain = 0; u64 dmap_start = 0, window_offset = 0; struct inode *inode = NULL; @@ -3918,6 +3918,12 @@ int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) if (nr_freed >= nr_to_free) break; + if (nr_eagain > 20) { + queue_delayed_work(system_long_wq, &fc->dax_free_work, + msecs_to_jiffies(10)); + return 0; + } + dmap = NULL; spin_lock(&fc->lock); @@ -3955,8 +3961,10 @@ int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) } /* Could not get inode lock. Try next element */ - if (ret == -EAGAIN) + if (ret == -EAGAIN) { + nr_eagain++; continue; + } nr_freed++; } return 0; From patchwork Mon Dec 10 17:13:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721839 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55FBB14E2 for ; Mon, 10 Dec 2018 17:15:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3ADA92AF3F for ; Mon, 10 Dec 2018 17:15:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2BC442AF37; Mon, 10 Dec 2018 17:15:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9183B2AF37 for ; Mon, 10 Dec 2018 17:15:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727909AbeLJRPw (ORCPT ); Mon, 10 Dec 2018 12:15:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55360 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728475AbeLJRNi (ORCPT ); Mon, 10 Dec 2018 12:13:38 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 79227804E9; Mon, 10 Dec 2018 17:13:38 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 18920103BAB3; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id AA19D224276; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 42/52] fuse: Wait for memory ranges to become free Date: Mon, 10 Dec 2018 12:13:08 -0500 Message-Id: <20181210171318.16998-43-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 10 Dec 2018 17:13:38 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sometimes we run out of memory ranges. So in that case, wait for memory ranges to become free, instead of returning -EBUSY. dax fault path is holding fuse_inode->i_mmap_sem and once that is being held, memory reclaim can't be done. Its not safe to wait while holding fuse_inode->i_mmap_sem for two reasons. - Worker thread to free memory might block on fuse_inode->i_mmap_sem as well. - This inode is holding all the memory and more memory can't be freed. In both the cases, deadlock will ensue. So return -ENOSPC from iomap_begin() in fault path if memory can't be allocated. Drop fuse_inode->i_mmap_sem, and wait for a free range to become available and retry. read/write path is a different story. We hold inode lock and lock ordering allows to grab fuse_inode->immap_sem, if needed. That means we can do direct reclaim in that path. But if there is no memory allocated to this inode, then direct reclaim will not work and we need to wait for a memory range to become free. So try following order. A. Try to get a free range. B. If not, try direct reclaim. C. If not, wait for a memory range to become free Here sleeping with locks held should be fine because in step B, we made sure this inode is not holding any ranges. That means other inodes are holding ranges and somebody should be able to free memory. Also, worker thread does a trylock() on inode lock. That means worker tread will not wait on this inode and move onto next memory range. Hence above sequence should be deadlock free. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 60 +++++++++++++++++++++++++++++++++++++++++++------------- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 1 + 3 files changed, 50 insertions(+), 14 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 709747458335..d0942ce0a6c3 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -220,6 +220,8 @@ static void __free_dax_mapping(struct fuse_conn *fc, { list_add_tail(&dmap->list, &fc->free_ranges); fc->nr_free_ranges++; + /* TODO: Wake up only when needed */ + wake_up(&fc->dax_range_waitq); } static void free_dax_mapping(struct fuse_conn *fc, @@ -1770,12 +1772,18 @@ static int fuse_iomap_begin(struct inode *inode, loff_t pos, loff_t length, goto iomap_hole; /* Can't do reclaim in fault path yet due to lock ordering */ - if (flags & IOMAP_FAULT) + if (flags & IOMAP_FAULT) { alloc_dmap = alloc_dax_mapping(fc); - else + if (!alloc_dmap) + return -ENOSPC; + } else { alloc_dmap = alloc_dax_mapping_reclaim(fc, inode); + if (IS_ERR(alloc_dmap)) + return PTR_ERR(alloc_dmap); + } - if (!alloc_dmap) + /* If we are here, we should have memory allocated */ + if (WARN_ON(!alloc_dmap)) return -EBUSY; /* @@ -2596,14 +2604,24 @@ static ssize_t fuse_file_splice_read(struct file *in, loff_t *ppos, static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, bool write) { - int ret; + int ret, error = 0; struct inode *inode = file_inode(vmf->vma->vm_file); struct super_block *sb = inode->i_sb; pfn_t pfn; + struct fuse_conn *fc = get_fuse_conn(inode); + bool retry = false; if (write) sb_start_pagefault(sb); +retry: + if (retry && !(fc->nr_free_ranges > 0)) { + ret = -EINTR; + if (wait_event_killable_exclusive(fc->dax_range_waitq, + (fc->nr_free_ranges > 0))) + goto out; + } + /* * We need to serialize against not only truncate but also against * fuse dax memory range reclaim. While a range is being reclaimed, @@ -2611,13 +2629,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, * to populate page cache or access memory we are trying to free. */ down_read(&get_fuse_inode(inode)->i_mmap_sem); - ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); + ret = dax_iomap_fault(vmf, pe_size, &pfn, &error, &fuse_iomap_ops); + if ((ret & VM_FAULT_ERROR) && error == -ENOSPC) { + error = 0; + retry = true; + up_read(&get_fuse_inode(inode)->i_mmap_sem); + goto retry; + } if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); up_read(&get_fuse_inode(inode)->i_mmap_sem); +out: if (write) sb_end_pagefault(sb); @@ -3828,16 +3853,23 @@ static struct fuse_dax_mapping *alloc_dax_mapping_reclaim(struct fuse_conn *fc, struct fuse_dax_mapping *dmap; struct fuse_inode *fi = get_fuse_inode(inode); - dmap = alloc_dax_mapping(fc); - if (dmap) - return dmap; - - /* There are no mappings which can be reclaimed */ - if (!fi->nr_dmaps) - return NULL; + while(1) { + dmap = alloc_dax_mapping(fc); + if (dmap) + return dmap; - /* Try reclaim a fuse dax memory range */ - return fuse_dax_reclaim_first_mapping(fc, inode); + if (fi->nr_dmaps) + return fuse_dax_reclaim_first_mapping(fc, inode); + /* + * There are no mappings which can be reclaimed. + * Wait for one. + */ + if (!(fc->nr_free_ranges > 0)) { + if (wait_event_killable_exclusive(fc->dax_range_waitq, + (fc->nr_free_ranges > 0))) + return ERR_PTR(-EINTR); + } + } } int fuse_dax_free_one_mapping_locked(struct fuse_conn *fc, struct inode *inode, diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index bbefa7c11078..7b2db87c6ead 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -886,6 +886,9 @@ struct fuse_conn { /* Worker to free up memory ranges */ struct delayed_work dax_free_work; + /* Wait queue for a dax range to become free */ + wait_queue_head_t dax_range_waitq; + /* * DAX Window Free Ranges. TODO: This might not be best place to store * this free list diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index d31acb97eede..178ac3171564 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -695,6 +695,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, atomic_set(&fc->dev_count, 1); init_waitqueue_head(&fc->blocked_waitq); init_waitqueue_head(&fc->reserved_req_waitq); + init_waitqueue_head(&fc->dax_range_waitq); fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv); INIT_LIST_HEAD(&fc->bg_queue); INIT_LIST_HEAD(&fc->entry); From patchwork Mon Dec 10 17:13:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721963 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B45403E9D for ; Mon, 10 Dec 2018 17:20:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 982AD2AEE4 for ; Mon, 10 Dec 2018 17:20:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 966C82AF61; Mon, 10 Dec 2018 17:20:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 926A22AF60 for ; Mon, 10 Dec 2018 17:20:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729138AbeLJRTz (ORCPT ); Mon, 10 Dec 2018 12:19:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45110 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728399AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C0E76368E7; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 78520600D6; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id AE534224277; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 43/52] fuse: Take inode lock for dax inode truncation Date: Mon, 10 Dec 2018 12:13:09 -0500 Message-Id: <20181210171318.16998-44-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When a file is opened with O_TRUNC, we need to make sure that any other DAX operation is not in progress. DAX expects i_size to be stable. In fuse_iomap_begin() we check for i_size at multiple places and we expect i_size to not change. Another problem is, if we setup a mapping in fuse_iomap_begin(), and file gets truncated and dax read/write happens, KVM currently hangs. It tries to fault in a page which does not exist on host (file got truncated). It probably requries fixing in KVM. So for now, take inode lock. Once KVM is fixed, we might have to have a look at it again. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d0942ce0a6c3..cb28cf26a6e7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -406,7 +406,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) int err; bool lock_inode = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc && - fc->writeback_cache; + (fc->writeback_cache || IS_DAX(inode)); err = generic_file_open(inode, file); if (err) From patchwork Mon Dec 10 17:13:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8578514E2 for ; Mon, 10 Dec 2018 17:18:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 69BCC2AF3D for ; Mon, 10 Dec 2018 17:18:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5D82E2AF42; Mon, 10 Dec 2018 17:18:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0309F2AF3D for ; Mon, 10 Dec 2018 17:18:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727912AbeLJRRu (ORCPT ); Mon, 10 Dec 2018 12:17:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50526 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728401AbeLJRNf (ORCPT ); Mon, 10 Dec 2018 12:13:35 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C0D1930001DF; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 787AA605C5; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id B170C224278; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 44/52] fuse: Clear setuid bit even in direct I/O path Date: Mon, 10 Dec 2018 12:13:10 -0500 Message-Id: <20181210171318.16998-45-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With cache=never, we fall back to direct IO. pjdfstest chmod test 12.t was failing because if a file has setuid bit, it should be cleared if an unpriviledged user opens it for write and writes to it. Call fuse_remove_privs() even for direct I/O path. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index cb28cf26a6e7..0be5a7380b3c 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1679,13 +1679,25 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) /* Don't allow parallel writes to the same file */ inode_lock(inode); res = generic_write_checks(iocb, from); - if (res > 0) - res = fuse_direct_io(&io, from, &iocb->ki_pos, FUSE_DIO_WRITE); + if (res < 0) + goto out_invalidate; + + res = file_remove_privs(iocb->ki_filp); + if (res) + goto out_invalidate; + + res = fuse_direct_io(&io, from, &iocb->ki_pos, FUSE_DIO_WRITE); + if (res < 0) + goto out_invalidate; + fuse_invalidate_attr(inode); - if (res > 0) - fuse_write_update_size(inode, iocb->ki_pos); + fuse_write_update_size(inode, iocb->ki_pos); inode_unlock(inode); + return res; +out_invalidate: + fuse_invalidate_attr(inode); + inode_unlock(inode); return res; } From patchwork Mon Dec 10 17:13:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721877 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69B4514E2 for ; Mon, 10 Dec 2018 17:17:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D6AC2AF37 for ; Mon, 10 Dec 2018 17:17:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41FA92AF3F; Mon, 10 Dec 2018 17:17:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB8992AF37 for ; Mon, 10 Dec 2018 17:17:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728874AbeLJRQi (ORCPT ); Mon, 10 Dec 2018 12:16:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52378 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728403AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E94497AE95; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id A4632600D7; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id B56C3224279; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 45/52] virtio: Free fuse devices on umount Date: Mon, 10 Dec 2018 12:13:11 -0500 Message-Id: <20181210171318.16998-46-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Dr. David Alan Gilbert" When unmounting the fs close all the fuse devices. This includes making sure the daemon gets a FUSE_DESTROY to tell it. Signed-off-by: Dr. David Alan Gilbert --- fs/fuse/fuse_i.h | 1 + fs/fuse/inode.c | 3 ++- fs/fuse/virtio_fs.c | 13 ++++++++++++- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7b2db87c6ead..30c7b4b56200 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -85,6 +85,7 @@ struct fuse_mount_data { unsigned default_permissions:1; unsigned allow_other:1; unsigned dax:1; + unsigned destroy:1; unsigned max_read; unsigned blksize; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 178ac3171564..4d2d623e607f 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1263,7 +1263,7 @@ int fuse_fill_super_common(struct super_block *sb, goto err_put_root; __set_bit(FR_BACKGROUND, &init_req->flags); - if (is_bdev) { + if (mount_data->destroy) { fc->destroy_req = fuse_request_alloc(0); if (!fc->destroy_req) goto err_free_init_req; @@ -1339,6 +1339,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) d.fiq_ops = &fuse_dev_fiq_ops; d.fiq_priv = NULL; d.fudptr = &file->private_data; + d.destroy = is_bdev; err = fuse_fill_super_common(sb, &d); err_fput: diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index f436f5b3f85c..c71bc47395b4 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1128,6 +1128,7 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, d.fiq_ops = &virtio_fs_fiq_ops; d.fiq_priv = fs; d.fudptr = (void **)&fs->vqs[2].fud; + d.destroy = true; /* Send destroy request on unmount */ err = fuse_fill_super_common(sb, &d); if (err < 0) @@ -1160,6 +1161,16 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, return err; } +static void virtio_kill_sb(struct super_block *sb) +{ + struct fuse_conn *fc = get_fuse_conn_super(sb); + fuse_kill_sb_anon(sb); + if (fc) { + struct virtio_fs *vfs = fc->iq.priv; + virtio_fs_free_devs(vfs); + } +} + static struct dentry *virtio_fs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *raw_data) @@ -1171,7 +1182,7 @@ static struct file_system_type virtio_fs_type = { .owner = THIS_MODULE, .name = KBUILD_MODNAME, .mount = virtio_fs_mount, - .kill_sb = fuse_kill_sb_anon, + .kill_sb = virtio_kill_sb, }; static int __init virtio_fs_init(void) From patchwork Mon Dec 10 17:13:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A63713AF for ; Mon, 10 Dec 2018 17:13:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B7A12A796 for ; Mon, 10 Dec 2018 17:13:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF5982A7DC; Mon, 10 Dec 2018 17:13:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9781A2A796 for ; Mon, 10 Dec 2018 17:13:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728468AbeLJRNi (ORCPT ); Mon, 10 Dec 2018 12:13:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40778 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728404AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 07C8E13A9D; Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id B5C5F605CB; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id B9C1622427A; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 46/52] virtio-fs: Retrieve shm capabilities for version table Date: Mon, 10 Dec 2018 12:13:12 -0500 Message-Id: <20181210171318.16998-47-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Dr. David Alan Gilbert" Retrieve the capabilities needed to find the journal and version table. Signed-off-by: Dr. David Alan Gilbert --- fs/fuse/virtio_fs.c | 26 ++++++++++++++++++++++++-- include/uapi/linux/virtio_fs.h | 2 ++ 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index c71bc47395b4..c18f406b61cd 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -589,8 +589,11 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) phys_addr_t phys_addr; size_t bar_len; int ret; - u8 have_cache, cache_bar; - u64 cache_offset, cache_len; + u8 have_cache, have_journal, have_vertab; + u8 cache_bar, journal_bar, vertab_bar; + u64 cache_offset, cache_len; + u64 journal_offset, journal_len; + u64 vertab_offset, vertab_len; if (!IS_ENABLED(CONFIG_DAX_DRIVER)) return 0; @@ -619,6 +622,25 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) cache_bar, cache_len, cache_offset); } + have_journal = virtio_pci_find_shm_cap(pci_dev, + VIRTIO_FS_PCI_SHMCAP_ID_JOURNAL, + &journal_bar, &journal_offset, + &journal_len); + if (have_journal) { + dev_notice(&vdev->dev, "Journal bar: %d len: 0x%llx @ 0x%llx\n", + journal_bar, journal_len, journal_offset); + } + + have_vertab = virtio_pci_find_shm_cap(pci_dev, + VIRTIO_FS_PCI_SHMCAP_ID_VERTAB, + &vertab_bar, &vertab_offset, + &vertab_len); + if (have_vertab) { + dev_notice(&vdev->dev, "Version table bar: %d len: 0x%llx @ 0x%llx\n", + vertab_bar, vertab_len, vertab_offset); + } + + /* TODO handle case where device doesn't expose BAR? */ ret = pci_request_region(pci_dev, cache_bar, "virtio-fs-window"); if (ret < 0) { diff --git a/include/uapi/linux/virtio_fs.h b/include/uapi/linux/virtio_fs.h index 65a9d4a0dac0..e70741ab14a8 100644 --- a/include/uapi/linux/virtio_fs.h +++ b/include/uapi/linux/virtio_fs.h @@ -40,5 +40,7 @@ struct virtio_fs_config { /* For the id field in virtio_pci_shm_cap */ #define VIRTIO_FS_PCI_SHMCAP_ID_CACHE 0 +#define VIRTIO_FS_PCI_SHMCAP_ID_VERTAB 1 +#define VIRTIO_FS_PCI_SHMCAP_ID_JOURNAL 2 #endif /* _UAPI_LINUX_VIRTIO_FS_H */ From patchwork Mon Dec 10 17:13:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721857 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 612D714E2 for ; Mon, 10 Dec 2018 17:16:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 446622AAEF for ; Mon, 10 Dec 2018 17:16:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 38DE42AF3D; Mon, 10 Dec 2018 17:16:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C833B2AAEF for ; Mon, 10 Dec 2018 17:16:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728719AbeLJRQK (ORCPT ); Mon, 10 Dec 2018 12:16:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33732 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728417AbeLJRNh (ORCPT ); Mon, 10 Dec 2018 12:13:37 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8AE8B308213A; Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C0375D75F; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id C0F2222427B; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 47/52] virtio-fs: Map using the values from the capabilities Date: Mon, 10 Dec 2018 12:13:13 -0500 Message-Id: <20181210171318.16998-48-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Dr. David Alan Gilbert" Instead of assuming we had the fixed bar for the cache, use the value from the capabilities. Use the other capabilities to map their memory. Signed-off-by: Dr. David Alan Gilbert --- fs/fuse/virtio_fs.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index c18f406b61cd..7d5b23455639 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -53,6 +53,16 @@ struct virtio_fs { void *window_kaddr; phys_addr_t window_phys_addr; size_t window_len; + + /* Version table where version numbers can be read */ + void *vertab_kaddr; + phys_addr_t vertab_phys_addr; + size_t vertab_len; + + /* Journal */ + void *journal_kaddr; + phys_addr_t journal_phys_addr; + size_t journal_len; }; struct virtio_fs_forget { @@ -684,6 +694,17 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) } phys_addr += cache_offset; + phys_addr = pci_resource_start(pci_dev, cache_bar); + bar_len = pci_resource_len(pci_dev, cache_bar); + + if (cache_offset + cache_len > bar_len) { + dev_err(&vdev->dev, + "%s: cache bar shorter than cap offset+len\n", + __func__); + return -EINVAL; + } + phys_addr += cache_offset; + /* Ideally we would directly use the PCI BAR resource but * devm_memremap_pages() wants its own copy in pgmap. So * initialize a struct resource from scratch (only the start @@ -710,6 +731,80 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) dev_dbg(&vdev->dev, "%s: cache kaddr 0x%px phys_addr 0x%llx len %llx\n", __func__, fs->window_kaddr, phys_addr, cache_len); + /* + * The journal and version table should be easier since DAX doesn't + * need them + */ + if (have_journal) { + if (journal_bar != cache_bar) { + ret = pci_request_region(pci_dev, journal_bar, + "virtio-fs-journal"); + if (ret < 0) { + dev_err(&vdev->dev, + "%s: failed to request journal BAR\n", + __func__); + return ret; + } + } + + phys_addr = pci_resource_start(pci_dev, journal_bar); + bar_len = pci_resource_len(pci_dev, journal_bar); + + if (journal_offset + journal_len > bar_len) { + dev_err(&vdev->dev, + "%s: journal bar shorter than cap offset+len\n", + __func__); + return -EINVAL; + } + fs->journal_phys_addr = phys_addr + journal_offset; + fs->journal_len = journal_len; + + fs->journal_kaddr = devm_memremap(&pci_dev->dev, + fs->journal_phys_addr, + journal_len, MEMREMAP_WB); + if (!fs->journal_kaddr) { + dev_err(&vdev->dev, "%s: failed to remap journal\n", + __func__); + return -ENOMEM; + } + dev_notice(&vdev->dev, "%s: journal at %px\n", __func__, + fs->journal_kaddr); + } + + if (have_vertab) { + if (vertab_bar != cache_bar && + vertab_bar != journal_bar) { + ret = pci_request_region(pci_dev, vertab_bar, + "virtio-fs-vertab"); + if (ret < 0) { + dev_err(&vdev->dev, "%s: failed to request" + " vertab BAR\n", __func__); + return ret; + } + } + + phys_addr = pci_resource_start(pci_dev, vertab_bar); + bar_len = pci_resource_len(pci_dev, vertab_bar); + + if (vertab_offset + vertab_len > bar_len) { + dev_err(&vdev->dev, "%s: version tab bar shorter than" + " cap offset+len\n", __func__); + return -EINVAL; + } + fs->vertab_phys_addr = phys_addr + vertab_offset; + fs->vertab_len = vertab_len; + fs->vertab_kaddr = devm_memremap(&pci_dev->dev, + fs->vertab_phys_addr, + vertab_len, MEMREMAP_WB); + if (!fs->vertab_kaddr) { + dev_err(&vdev->dev, "%s: failed to remap version" + " table\n", __func__); + return -ENOMEM; + } + dev_notice(&vdev->dev, "%s: version table at %px\n", + __func__, fs->vertab_kaddr); + } + fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops); if (!fs->dax_dev) return -ENOMEM; From patchwork Mon Dec 10 17:13:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721895 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FF7714E2 for ; Mon, 10 Dec 2018 17:17:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9E6B2ABC3 for ; Mon, 10 Dec 2018 17:17:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DE2A52AF3C; Mon, 10 Dec 2018 17:17:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F6222ABC3 for ; Mon, 10 Dec 2018 17:17:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728424AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47786 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728408AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1E5313082193; Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id C74AE605C9; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id C7F5022427C; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 48/52] virtio-fs: pass version table pointer to fuse Date: Mon, 10 Dec 2018 12:13:14 -0500 Message-Id: <20181210171318.16998-49-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Signed-off-by: Miklos Szeredi --- fs/fuse/fuse_i.h | 12 ++++++++++++ fs/fuse/inode.c | 10 ++++++++++ fs/fuse/virtio_fs.c | 2 ++ 3 files changed, 24 insertions(+) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 30c7b4b56200..8a2604606d51 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -100,6 +100,12 @@ struct fuse_mount_data { /* fuse_dev pointer to fill in, should contain NULL on entry */ void **fudptr; + + /* version table length in bytes */ + size_t vertab_len; + + /* version table kernel address */ + void *vertab_kaddr; }; /* One forget request */ @@ -898,6 +904,12 @@ struct fuse_conn { struct list_head free_ranges; unsigned long nr_ranges; + + /** Size of version table */ + uint64_t version_table_size; + + /** Shared version entry for each active inode */ + s64 *version_table; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 4d2d623e607f..1ab4df442390 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -52,6 +52,7 @@ MODULE_PARM_DESC(max_user_congthresh, "unprivileged user can set"); #define FUSE_SUPER_MAGIC 0x65735546 +#define VERSION_TABLE_MAGIC 0x7265566465726853 #define FUSE_DEFAULT_BLKSIZE 512 @@ -1215,6 +1216,15 @@ int fuse_fill_super_common(struct super_block *sb, fuse_conn_init(fc, sb->s_user_ns, mount_data->dax_dev, mount_data->fiq_ops, mount_data->fiq_priv); fc->release = fuse_free_conn; + fc->version_table_size = mount_data->vertab_len / sizeof(s64); + fc->version_table = mount_data->vertab_kaddr; + + if (fc->version_table[0] != VERSION_TABLE_MAGIC) { + pr_warn("bad version table magic: 0x%16llx\n", + fc->version_table[0]); + fc->version_table_size = 0; + fc->version_table = NULL; + } if (mount_data->dax_dev) { err = fuse_dax_mem_range_init(fc, mount_data->dax_dev); diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 7d5b23455639..88b00055589b 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1246,6 +1246,8 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, d.fiq_priv = fs; d.fudptr = (void **)&fs->vqs[2].fud; d.destroy = true; /* Send destroy request on unmount */ + d.vertab_len = fs->vertab_len; + d.vertab_kaddr = fs->vertab_kaddr; err = fuse_fill_super_common(sb, &d); if (err < 0) From patchwork Mon Dec 10 17:13:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721847 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57FC114E2 for ; Mon, 10 Dec 2018 17:16:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E8262AF37 for ; Mon, 10 Dec 2018 17:16:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 32F4E2AF40; Mon, 10 Dec 2018 17:16:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E172F2AF37 for ; Mon, 10 Dec 2018 17:16:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728451AbeLJRNi (ORCPT ); Mon, 10 Dec 2018 12:13:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47790 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728409AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 305C830820D6; Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id D6591600D6; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id CBCD122427D; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 49/52] fuse: don't crash if version table is NULL Date: Mon, 10 Dec 2018 12:13:15 -0500 Message-Id: <20181210171318.16998-50-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Version table can be NULL. Do not crash. Signed-off-by: Miklos Szeredi --- fs/fuse/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 1ab4df442390..d44827bbfa3d 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1219,7 +1219,8 @@ int fuse_fill_super_common(struct super_block *sb, fc->version_table_size = mount_data->vertab_len / sizeof(s64); fc->version_table = mount_data->vertab_kaddr; - if (fc->version_table[0] != VERSION_TABLE_MAGIC) { + if (fc->version_table && fc->version_table_size > 0 && + fc->version_table[0] != VERSION_TABLE_MAGIC) { pr_warn("bad version table magic: 0x%16llx\n", fc->version_table[0]); fc->version_table_size = 0; From patchwork Mon Dec 10 17:13:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721861 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AB8715A6 for ; Mon, 10 Dec 2018 17:16:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A3AB2AAEF for ; Mon, 10 Dec 2018 17:16:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5E6672AF3D; Mon, 10 Dec 2018 17:16:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE7312AAEF for ; Mon, 10 Dec 2018 17:16:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728453AbeLJRQJ (ORCPT ); Mon, 10 Dec 2018 12:16:09 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59302 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728414AbeLJRNh (ORCPT ); Mon, 10 Dec 2018 12:13:37 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3FF5D792B8; Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id DBCFD60158; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id D009622427E; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 50/52] fuse: add shared version support (virtio-fs only) Date: Mon, 10 Dec 2018 12:13:16 -0500 Message-Id: <20181210171318.16998-51-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Metadata and dcache versioning support. READDIRPLUS doesn't supply version information yet, so don't use. Signed-off-by: Miklos Szeredi --- fs/fuse/dev.c | 3 +- fs/fuse/dir.c | 244 +++++++++++++++++++++++++++++++++++++++------- fs/fuse/file.c | 53 ++++++---- fs/fuse/fuse_i.h | 25 +++-- fs/fuse/inode.c | 23 +++-- fs/fuse/readdir.c | 12 ++- include/uapi/linux/fuse.h | 5 + 7 files changed, 284 insertions(+), 81 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index f35c4ab2dcbb..9ed326d716ee 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -640,8 +640,7 @@ ssize_t fuse_simple_request(struct fuse_conn *fc, struct fuse_args *args) args->out.numargs * sizeof(struct fuse_arg)); fuse_request_send(fc, req); ret = req->out.h.error; - if (!ret && args->out.argvar) { - BUG_ON(args->out.numargs != 1); + if (!ret && args->out.argvar && args->out.numargs == 1) { ret = req->out.args[0].size; } fuse_put_request(fc, req); diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 8aa4ff82ea7a..3aa214f9a28e 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -25,7 +25,11 @@ static void fuse_advise_use_readdirplus(struct inode *dir) } union fuse_dentry { - u64 time; + struct { + u64 time; + s64 version; + s64 parent_version; + }; struct rcu_head rcu; }; @@ -48,6 +52,18 @@ static void fuse_dentry_settime(struct dentry *dentry, u64 time) ((union fuse_dentry *) dentry->d_fsdata)->time = time; } +static inline void fuse_dentry_setver(struct dentry *entry, + struct fuse_entryver_out *outver, + s64 pver) +{ + union fuse_dentry *fude = entry->d_fsdata; + + smp_wmb(); + /* FIXME: verify versions aren't going backwards */ + WRITE_ONCE(fude->version, outver->initial_version); + WRITE_ONCE(fude->parent_version, pver); +} + static inline u64 fuse_dentry_time(const struct dentry *entry) { return ((union fuse_dentry *) entry->d_fsdata)->time; @@ -150,34 +166,118 @@ static void fuse_invalidate_entry(struct dentry *entry) static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args, u64 nodeid, const struct qstr *name, - struct fuse_entry_out *outarg) + struct fuse_entry_out *outarg, + struct fuse_entryver_out *outver) { memset(outarg, 0, sizeof(struct fuse_entry_out)); + memset(outver, 0, sizeof(struct fuse_entryver_out)); args->in.h.opcode = FUSE_LOOKUP; args->in.h.nodeid = nodeid; args->in.numargs = 1; args->in.args[0].size = name->len + 1; args->in.args[0].value = name->name; - args->out.numargs = 1; + args->out.argvar = 1; + args->out.numargs = 2; args->out.args[0].size = sizeof(struct fuse_entry_out); args->out.args[0].value = outarg; + args->out.args[1].size = sizeof(struct fuse_entryver_out); + args->out.args[1].value = outver; } -u64 fuse_get_attr_version(struct fuse_conn *fc) +s64 fuse_get_attr_version(struct inode *inode) { - u64 curr_version; + struct fuse_inode *fi = get_fuse_inode(inode); + s64 curr_version; - /* - * The spin lock isn't actually needed on 64bit archs, but we - * don't yet care too much about such optimizations. - */ - spin_lock(&fc->lock); - curr_version = fc->attr_version; - spin_unlock(&fc->lock); + if (fi->version_ptr) { + curr_version = READ_ONCE(*fi->version_ptr); + } else { + struct fuse_conn *fc = get_fuse_conn(inode); + + /* + * The spin lock isn't actually needed on 64bit archs, but we + * don't yet care too much about such optimizations. + */ + spin_lock(&fc->lock); + curr_version = fc->attr_ctr; + spin_unlock(&fc->lock); + } + + return curr_version; +} + +static s64 fuse_get_attr_version_shared(struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + s64 curr_version = 0; + + if (fi->version_ptr) + curr_version = READ_ONCE(*fi->version_ptr); return curr_version; } +static bool fuse_version_mismatch(struct inode *inode, s64 version) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + bool mismatch = false; + + if (fi->version_ptr) { + s64 curr_version = READ_ONCE(*fi->version_ptr); + + mismatch = curr_version != version; + smp_rmb(); + + if (mismatch) { + pr_info("mismatch: nodeid=%llu curr=%lli cache=%lli\n", + get_node_id(inode), curr_version, version); + } + } + + return mismatch; +} + +static bool fuse_dentry_version_mismatch(struct dentry *dentry) +{ + union fuse_dentry *fude = dentry->d_fsdata; + struct inode *dir = d_inode_rcu(dentry->d_parent); + struct inode *inode = d_inode_rcu(dentry); + + if (!fuse_version_mismatch(dir, READ_ONCE(fude->parent_version))) + return false; + + /* Can only validate negatives based on parent version */ + if (!inode) + return true; + + return fuse_version_mismatch(inode, READ_ONCE(fude->version)); +} + +static void fuse_set_version_ptr(struct inode *inode, + struct fuse_entryver_out *outver) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + if (!fc->version_table || !outver->version_index) { + fi->version_ptr = NULL; + return; + } + if (outver->version_index >= fc->version_table_size) { + pr_warn_ratelimited("version index too large (%llu >= %llu)\n", + outver->version_index, + fc->version_table_size); + fi->version_ptr = NULL; + return; + } + + fi->version_ptr = fc->version_table + outver->version_index; + + pr_info("fuse: version_ptr = %p\n", fi->version_ptr); + pr_info("fuse: version = %lli\n", fi->attr_version); + pr_info("fuse: current_version: %lli\n", *fi->version_ptr); +} + /* * Check whether the dentry is still valid * @@ -198,12 +298,15 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) inode = d_inode_rcu(entry); if (inode && is_bad_inode(inode)) goto invalid; - else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) || + else if (fuse_dentry_version_mismatch(entry) || + time_before64(fuse_dentry_time(entry), get_jiffies_64()) || (flags & LOOKUP_REVAL)) { struct fuse_entry_out outarg; + struct fuse_entryver_out outver; FUSE_ARGS(args); struct fuse_forget_link *forget; - u64 attr_version; + s64 attr_version; + s64 parent_version; /* For negative dentries, always do a fresh lookup */ if (!inode) @@ -220,11 +323,12 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (!forget) goto out; - attr_version = fuse_get_attr_version(fc); + attr_version = fuse_get_attr_version(inode); parent = dget_parent(entry); + parent_version = fuse_get_attr_version_shared(d_inode(parent)); fuse_lookup_init(fc, &args, get_node_id(d_inode(parent)), - &entry->d_name, &outarg); + &entry->d_name, &outarg, &outver); ret = fuse_simple_request(fc, &args); dput(parent); /* Zero nodeid is same as -ENOENT */ @@ -236,6 +340,9 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) fuse_queue_forget(fc, forget, outarg.nodeid, 1); goto invalid; } + if (fi->version_ptr != fc->version_table + outver.version_index) + pr_warn("fuse_dentry_revalidate: version_ptr changed (%p -> %p)\n", fi->version_ptr, fc->version_table + outver.version_index); + spin_lock(&fc->lock); fi->nlookup++; spin_unlock(&fc->lock); @@ -246,14 +353,26 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (ret || (outarg.attr.mode ^ inode->i_mode) & S_IFMT) goto invalid; + if (fi->version_ptr) { + if (outver.initial_version > attr_version) + attr_version = outver.initial_version; + else if (outver.initial_version < attr_version) + pr_warn("fuse_dentry_revalidate: backward going version (%lli -> %lli)\n", attr_version, outver.initial_version); + } + forget_all_cached_acls(inode); fuse_change_attributes(inode, &outarg.attr, entry_attr_timeout(&outarg), attr_version); fuse_change_entry_timeout(entry, &outarg); + fuse_dentry_setver(entry, &outver, parent_version); } else if (inode) { fi = get_fuse_inode(inode); if (flags & LOOKUP_RCU) { + /* + * FIXME: Don't leave rcu if FUSE_I_ADVISE_RDPLUS is + * already set? + */ if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state)) return -ECHILD; } else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) { @@ -307,13 +426,16 @@ int fuse_valid_type(int m) S_ISBLK(m) || S_ISFIFO(m) || S_ISSOCK(m); } -int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name, - struct fuse_entry_out *outarg, struct inode **inode) +static int fuse_lookup_name_with_ver(struct super_block *sb, u64 nodeid, + const struct qstr *name, + struct fuse_entry_out *outarg, + struct fuse_entryver_out *outver, + struct inode **inode) { struct fuse_conn *fc = get_fuse_conn_super(sb); FUSE_ARGS(args); struct fuse_forget_link *forget; - u64 attr_version; + s64 attr_version; int err; *inode = NULL; @@ -327,9 +449,11 @@ int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name if (!forget) goto out; - attr_version = fuse_get_attr_version(fc); + spin_lock(&fc->lock); + attr_version = fc->attr_ctr; + spin_unlock(&fc->lock); - fuse_lookup_init(fc, &args, nodeid, name, outarg); + fuse_lookup_init(fc, &args, nodeid, name, outarg, outver); err = fuse_simple_request(fc, &args); /* Zero nodeid is same as -ENOENT, but with valid timeout */ if (err || !outarg->nodeid) @@ -357,19 +481,32 @@ int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name return err; } +int fuse_lookup_name(struct super_block *sb, u64 nodeid, + const struct qstr *name, + struct fuse_entry_out *outarg, struct inode **inode) +{ + struct fuse_entryver_out outver; + + return fuse_lookup_name_with_ver(sb, nodeid, name, outarg, &outver, + inode); +} + static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, unsigned int flags) { int err; struct fuse_entry_out outarg; + struct fuse_entryver_out outver; struct inode *inode; struct dentry *newent; bool outarg_valid = true; + s64 parent_version = fuse_get_attr_version_shared(dir); bool locked; locked = fuse_lock_inode(dir); - err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name, - &outarg, &inode); + err = fuse_lookup_name_with_ver(dir->i_sb, get_node_id(dir), + &entry->d_name, &outarg, &outver, + &inode); fuse_unlock_inode(dir, locked); if (err == -ENOENT) { outarg_valid = false; @@ -382,16 +519,21 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, if (inode && get_node_id(inode) == FUSE_ROOT_ID) goto out_iput; + if (inode) + fuse_set_version_ptr(inode, &outver); + newent = d_splice_alias(inode, entry); err = PTR_ERR(newent); if (IS_ERR(newent)) goto out_err; entry = newent ? newent : entry; - if (outarg_valid) + if (outarg_valid) { fuse_change_entry_timeout(entry, &outarg); - else + fuse_dentry_setver(entry, &outver, parent_version); + } else { fuse_invalidate_entry_cache(entry); + } fuse_advise_use_readdirplus(dir); return newent; @@ -420,7 +562,9 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, struct fuse_create_in inarg; struct fuse_open_out outopen; struct fuse_entry_out outentry; + struct fuse_entryver_out outver; struct fuse_file *ff; + s64 parent_version = fuse_get_attr_version_shared(dir); /* Userspace expects S_IFREG in create mode */ BUG_ON((mode & S_IFMT) != S_IFREG); @@ -451,11 +595,14 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, args.in.args[0].value = &inarg; args.in.args[1].size = entry->d_name.len + 1; args.in.args[1].value = entry->d_name.name; - args.out.numargs = 2; + args.out.argvar = 1; + args.out.numargs = 3; args.out.args[0].size = sizeof(outentry); args.out.args[0].value = &outentry; args.out.args[1].size = sizeof(outopen); args.out.args[1].value = &outopen; + args.out.args[2].size = sizeof(outver); + args.out.args[2].value = &outver; err = fuse_simple_request(fc, &args); if (err) goto out_free_ff; @@ -478,7 +625,9 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, } kfree(forget); d_instantiate(entry, inode); + fuse_set_version_ptr(inode, &outver); fuse_change_entry_timeout(entry, &outentry); + fuse_dentry_setver(entry, &outver, parent_version); fuse_dir_changed(dir); err = finish_open(file, entry, generic_file_open); if (err) { @@ -549,10 +698,12 @@ static int create_new_entry(struct fuse_conn *fc, struct fuse_args *args, umode_t mode) { struct fuse_entry_out outarg; + struct fuse_entryver_out outver; struct inode *inode; struct dentry *d; int err; struct fuse_forget_link *forget; + s64 parent_version = fuse_get_attr_version_shared(dir); forget = fuse_alloc_forget(); if (!forget) @@ -560,9 +711,12 @@ static int create_new_entry(struct fuse_conn *fc, struct fuse_args *args, memset(&outarg, 0, sizeof(outarg)); args->in.h.nodeid = get_node_id(dir); - args->out.numargs = 1; + args->out.argvar = 1; + args->out.numargs = 2; args->out.args[0].size = sizeof(outarg); args->out.args[0].value = &outarg; + args->out.args[1].size = sizeof(outver); + args->out.args[1].value = &outver; err = fuse_simple_request(fc, args); if (err) goto out_put_forget_req; @@ -582,6 +736,8 @@ static int create_new_entry(struct fuse_conn *fc, struct fuse_args *args, } kfree(forget); + fuse_set_version_ptr(inode, &outver); + d_drop(entry); d = d_splice_alias(inode, entry); if (IS_ERR(d)) @@ -589,9 +745,11 @@ static int create_new_entry(struct fuse_conn *fc, struct fuse_args *args, if (d) { fuse_change_entry_timeout(d, &outarg); + fuse_dentry_setver(d, &outver, parent_version); dput(d); } else { fuse_change_entry_timeout(entry, &outarg); + fuse_dentry_setver(entry, &outver, parent_version); } fuse_dir_changed(dir); return 0; @@ -689,10 +847,9 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry) err = fuse_simple_request(fc, &args); if (!err) { struct inode *inode = d_inode(entry); - struct fuse_inode *fi = get_fuse_inode(inode); spin_lock(&fc->lock); - fi->attr_version = ++fc->attr_version; + fuse_update_attr_version_locked(inode); /* * If i_nlink == 0 then unlink doesn't make sense, yet this can * happen if userspace filesystem is careless. It would be @@ -843,10 +1000,8 @@ static int fuse_link(struct dentry *entry, struct inode *newdir, etc.) */ if (!err) { - struct fuse_inode *fi = get_fuse_inode(inode); - spin_lock(&fc->lock); - fi->attr_version = ++fc->attr_version; + fuse_update_attr_version_locked(inode); inc_nlink(inode); spin_unlock(&fc->lock); fuse_invalidate_attr(inode); @@ -904,9 +1059,9 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat, struct fuse_attr_out outarg; struct fuse_conn *fc = get_fuse_conn(inode); FUSE_ARGS(args); - u64 attr_version; + s64 attr_version; - attr_version = fuse_get_attr_version(fc); + attr_version = fuse_get_attr_version(inode); memset(&inarg, 0, sizeof(inarg)); memset(&outarg, 0, sizeof(outarg)); @@ -941,6 +1096,13 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat, return err; } +static bool fuse_shared_version_mismatch(struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + + return fuse_version_mismatch(inode, READ_ONCE(fi->attr_version)); +} + static int fuse_update_get_attr(struct inode *inode, struct file *file, struct kstat *stat, u32 request_mask, unsigned int flags) @@ -956,7 +1118,8 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file, else if (request_mask & READ_ONCE(fi->inval_mask)) sync = true; else - sync = time_before64(fi->i_time, get_jiffies_64()); + sync = (fuse_shared_version_mismatch(inode) || + time_before64(fi->i_time, get_jiffies_64())); if (sync) { forget_all_cached_acls(inode); @@ -1150,7 +1313,9 @@ static int fuse_permission(struct inode *inode, int mask) } if (fc->default_permissions) { - err = generic_permission(inode, mask); + err = -EACCES; + if (!refreshed && !fuse_shared_version_mismatch(inode)) + err = generic_permission(inode, mask); /* If permission is denied, try to refresh file attributes. This is also needed, because the root @@ -1459,6 +1624,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, loff_t oldsize; int err; bool trust_local_cmtime = is_wb && S_ISREG(inode->i_mode); + s64 attr_version = fuse_get_attr_version(inode); if (!fc->default_permissions) attr->ia_valid |= ATTR_FORCE; @@ -1534,8 +1700,12 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, /* FIXME: clear I_DIRTY_SYNC? */ } + if (fi->version_ptr) + attr_version++; + else + attr_version = fuse_update_attr_version_locked(inode); fuse_change_attributes_common(inode, &outarg.attr, - attr_timeout(&outarg)); + attr_timeout(&outarg), attr_version); oldsize = inode->i_size; /* see the comment in fuse_change_attributes() */ if (!is_wb || is_truncate || !S_ISREG(inode->i_mode)) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 0be5a7380b3c..4cb8c8a8011c 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -376,6 +376,28 @@ void fuse_removemapping(struct inode *inode) pr_debug("%s request succeeded\n", __func__); } +s64 fuse_update_attr_version_locked(struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + s64 curr_version = 0; + + if (!fi->version_ptr) { + struct fuse_conn *fc = get_fuse_conn(inode); + + curr_version = fi->attr_version = fc->attr_ctr++; + } + return curr_version; +} + +static void fuse_update_attr_version(struct inode *inode) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + + spin_lock(&fc->lock); + fuse_update_attr_version_locked(inode); + spin_unlock(&fc->lock); +} + void fuse_finish_open(struct inode *inode, struct file *file) { struct fuse_file *ff = file->private_data; @@ -386,12 +408,11 @@ void fuse_finish_open(struct inode *inode, struct file *file) if (ff->open_flags & FOPEN_NONSEEKABLE) nonseekable_open(inode, file); if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) { - struct fuse_inode *fi = get_fuse_inode(inode); - spin_lock(&fc->lock); - fi->attr_version = ++fc->attr_version; + fuse_update_attr_version_locked(inode); i_size_write(inode, 0); spin_unlock(&fc->lock); + fuse_invalidate_attr(inode); if (fc->writeback_cache) file_update_time(file); @@ -806,15 +827,8 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) if (!left && !io->blocking) { ssize_t res = fuse_get_res_by_io(io); - if (res >= 0) { - struct inode *inode = file_inode(io->iocb->ki_filp); - struct fuse_conn *fc = get_fuse_conn(inode); - struct fuse_inode *fi = get_fuse_inode(inode); - - spin_lock(&fc->lock); - fi->attr_version = ++fc->attr_version; - spin_unlock(&fc->lock); - } + if (res >= 0) + fuse_update_attr_version(file_inode(io->iocb->ki_filp)); io->iocb->ki_complete(io->iocb, res, 0); } @@ -883,7 +897,7 @@ static size_t fuse_send_read(struct fuse_req *req, struct fuse_io_priv *io, } static void fuse_read_update_size(struct inode *inode, loff_t size, - u64 attr_ver) + s64 attr_ver) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); @@ -891,14 +905,14 @@ static void fuse_read_update_size(struct inode *inode, loff_t size, spin_lock(&fc->lock); if (attr_ver == fi->attr_version && size < inode->i_size && !test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) { - fi->attr_version = ++fc->attr_version; + fuse_update_attr_version_locked(inode); i_size_write(inode, size); } spin_unlock(&fc->lock); } static void fuse_short_read(struct fuse_req *req, struct inode *inode, - u64 attr_ver) + s64 attr_ver) { size_t num_read = req->out.args[0].size; struct fuse_conn *fc = get_fuse_conn(inode); @@ -933,7 +947,7 @@ static int fuse_do_readpage(struct file *file, struct page *page) size_t num_read; loff_t pos = page_offset(page); size_t count = PAGE_SIZE; - u64 attr_ver; + s64 attr_ver; int err; /* @@ -947,7 +961,7 @@ static int fuse_do_readpage(struct file *file, struct page *page) if (IS_ERR(req)) return PTR_ERR(req); - attr_ver = fuse_get_attr_version(fc); + attr_ver = fuse_get_attr_version(inode); req->out.page_zeroing = 1; req->out.argpages = 1; @@ -1036,7 +1050,7 @@ static void fuse_send_readpages(struct fuse_req *req, struct file *file) req->out.page_zeroing = 1; req->out.page_replace = 1; fuse_read_fill(req, file, pos, count, FUSE_READ); - req->misc.read.attr_ver = fuse_get_attr_version(fc); + req->misc.read.attr_ver = fuse_get_attr_version(file_inode(file)); if (fc->async_read) { req->ff = fuse_file_get(ff); req->end = fuse_readpages_end; @@ -1218,11 +1232,10 @@ static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv *io, bool fuse_write_update_size(struct inode *inode, loff_t pos) { struct fuse_conn *fc = get_fuse_conn(inode); - struct fuse_inode *fi = get_fuse_inode(inode); bool ret = false; spin_lock(&fc->lock); - fi->attr_version = ++fc->attr_version; + fuse_update_attr_version_locked(inode); if (pos > inode->i_size) { i_size_write(inode, pos); ret = true; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 8a2604606d51..9ea5d0f760f4 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -172,7 +172,7 @@ struct fuse_inode { u64 orig_ino; /** Version of last attribute change */ - u64 attr_version; + s64 attr_version; union { /* Write related fields (regular file only) */ @@ -223,7 +223,7 @@ struct fuse_inode { /** Miscellaneous bits describing inode state */ unsigned long state; - /** Lock for serializing lookup and readdir for back compatibility*/ + /** Lock for serializing lookup and readdir for back compatibility */ struct mutex mutex; /* @@ -241,6 +241,9 @@ struct fuse_inode { /** Sorted rb tree of struct fuse_dax_mapping elements */ struct rb_root_cached dmap_tree; unsigned long nr_dmaps; + + /** Pointer to shared version */ + s64 *version_ptr; }; /** FUSE inode state bits */ @@ -364,7 +367,7 @@ struct fuse_out { unsigned numargs; /** Array of arguments */ - struct fuse_arg args[2]; + struct fuse_arg args[3]; }; /** FUSE page descriptor */ @@ -386,7 +389,7 @@ struct fuse_args { struct { unsigned argvar:1; unsigned numargs; - struct fuse_arg args[2]; + struct fuse_arg args[3]; } out; }; @@ -486,7 +489,7 @@ struct fuse_req { struct cuse_init_in cuse_init_in; struct { struct fuse_read_in in; - u64 attr_ver; + s64 attr_ver; } read; struct { struct fuse_write_in in; @@ -869,7 +872,7 @@ struct fuse_conn { struct fuse_req *destroy_req; /** Version counter for attribute changes */ - u64 attr_version; + s64 attr_ctr; /** Called on final put */ void (*release)(struct fuse_conn *); @@ -953,7 +956,7 @@ int fuse_inode_eq(struct inode *inode, void *_nodeidp); */ struct inode *fuse_iget(struct super_block *sb, u64 nodeid, int generation, struct fuse_attr *attr, - u64 attr_valid, u64 attr_version); + u64 attr_valid, s64 attr_version); int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name, struct fuse_entry_out *outarg, struct inode **inode); @@ -1027,10 +1030,10 @@ void fuse_init_symlink(struct inode *inode); * Change attributes of an inode */ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, - u64 attr_valid, u64 attr_version); + u64 attr_valid, s64 attr_version); void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, - u64 attr_valid); + u64 attr_valid, s64 attr_version); /** * Initialize the client device @@ -1195,7 +1198,7 @@ void fuse_flush_writepages(struct inode *inode); void fuse_set_nowrite(struct inode *inode); void fuse_release_nowrite(struct inode *inode); -u64 fuse_get_attr_version(struct fuse_conn *fc); +s64 fuse_get_attr_version(struct inode *inode); /** * File-system tells the kernel to invalidate cache for the given node id. @@ -1281,4 +1284,6 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq); void fuse_dax_free_mem_worker(struct work_struct *work); void fuse_removemapping(struct inode *inode); +s64 fuse_update_attr_version_locked(struct inode *inode); + #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index d44827bbfa3d..ea2be153a322 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -82,6 +82,8 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->nodeid = 0; fi->nlookup = 0; fi->attr_version = 0; + fi->state = 0; + fi->version_ptr = NULL; fi->orig_ino = 0; fi->state = 0; fi->nr_dmaps = 0; @@ -153,12 +155,11 @@ static ino_t fuse_squash_ino(u64 ino64) } void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, - u64 attr_valid) + u64 attr_valid, s64 attr_version) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); - fi->attr_version = ++fc->attr_version; fi->i_time = attr_valid; WRITE_ONCE(fi->inval_mask, 0); @@ -193,10 +194,13 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, inode->i_mode &= ~S_ISVTX; fi->orig_ino = attr->ino; + smp_wmb(); + WRITE_ONCE(fi->attr_version, attr_version); + } void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, - u64 attr_valid, u64 attr_version) + u64 attr_valid, s64 attr_version) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); @@ -205,14 +209,17 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, struct timespec64 old_mtime; spin_lock(&fc->lock); - if ((attr_version != 0 && fi->attr_version > attr_version) || - test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) { + if (test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) { + spin_unlock(&fc->lock); + return; + } + if (attr_version != 0 && fi->attr_version > attr_version) { spin_unlock(&fc->lock); return; } old_mtime = inode->i_mtime; - fuse_change_attributes_common(inode, attr, attr_valid); + fuse_change_attributes_common(inode, attr, attr_valid, attr_version); oldsize = inode->i_size; /* @@ -291,7 +298,7 @@ static int fuse_inode_set(struct inode *inode, void *_nodeidp) struct inode *fuse_iget(struct super_block *sb, u64 nodeid, int generation, struct fuse_attr *attr, - u64 attr_valid, u64 attr_version) + u64 attr_valid, s64 attr_version) { struct inode *inode; struct fuse_inode *fi; @@ -709,7 +716,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->blocked = 0; fc->initialized = 0; fc->connected = 1; - fc->attr_version = 1; + fc->attr_ctr = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); fc->dax_dev = dax_dev; diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c index ab18b78f4755..e3ecc56013b8 100644 --- a/fs/fuse/readdir.c +++ b/fs/fuse/readdir.c @@ -147,7 +147,7 @@ static int parse_dirfile(char *buf, size_t nbytes, struct file *file, static int fuse_direntplus_link(struct file *file, struct fuse_direntplus *direntplus, - u64 attr_version) + s64 attr_version) { struct fuse_entry_out *o = &direntplus->entry_out; struct fuse_dirent *dirent = &direntplus->dirent; @@ -212,6 +212,9 @@ static int fuse_direntplus_link(struct file *file, return -EIO; } + /* FIXME: translate version_ptr on reading from device... */ + /* fuse_set_version_ptr(inode, o); */ + fi = get_fuse_inode(inode); spin_lock(&fc->lock); fi->nlookup++; @@ -231,6 +234,7 @@ static int fuse_direntplus_link(struct file *file, attr_version); if (!inode) inode = ERR_PTR(-ENOMEM); + /* else fuse_set_version_ptr(inode, o); */ alias = d_splice_alias(inode, dentry); d_lookup_done(dentry); @@ -250,7 +254,7 @@ static int fuse_direntplus_link(struct file *file, } static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, - struct dir_context *ctx, u64 attr_version) + struct dir_context *ctx, s64 attr_version) { struct fuse_direntplus *direntplus; struct fuse_dirent *dirent; @@ -301,7 +305,7 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx) struct inode *inode = file_inode(file); struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; - u64 attr_version = 0; + s64 attr_version = 0; bool locked; req = fuse_get_req(fc, 1); @@ -320,7 +324,7 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx) req->pages[0] = page; req->page_descs[0].length = PAGE_SIZE; if (plus) { - attr_version = fuse_get_attr_version(fc); + attr_version = fuse_get_attr_version(inode); fuse_read_fill(req, file, ctx->pos, PAGE_SIZE, FUSE_READDIRPLUS); } else { diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 1657253cb7d6..301c3c23228f 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -427,6 +427,11 @@ struct fuse_entry_out { struct fuse_attr attr; }; +struct fuse_entryver_out { + uint64_t version_index; + int64_t initial_version; +}; + struct fuse_forget_in { uint64_t nlookup; }; From patchwork Mon Dec 10 17:13:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 00C7C14E2 for ; Mon, 10 Dec 2018 17:15:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D9C6E2AF37 for ; Mon, 10 Dec 2018 17:15:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CDDD72AF3F; Mon, 10 Dec 2018 17:15:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B3872AF37 for ; Mon, 10 Dec 2018 17:15:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728509AbeLJRNj (ORCPT ); Mon, 10 Dec 2018 12:13:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45114 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728410AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2ED6C2D7E8; Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id DBAC46012B; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id D358222427F; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 51/52] fuse: shared version cleanups Date: Mon, 10 Dec 2018 12:13:17 -0500 Message-Id: <20181210171318.16998-52-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Signed-off-by: Miklos Szeredi --- fs/fuse/dir.c | 40 +++++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 3aa214f9a28e..f9a91e782cf0 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -253,29 +253,36 @@ static bool fuse_dentry_version_mismatch(struct dentry *dentry) return fuse_version_mismatch(inode, READ_ONCE(fude->version)); } -static void fuse_set_version_ptr(struct inode *inode, - struct fuse_entryver_out *outver) +static s64 *fuse_version_ptr(struct inode *inode, + struct fuse_entryver_out *outver) { struct fuse_conn *fc = get_fuse_conn(inode); - struct fuse_inode *fi = get_fuse_inode(inode); - if (!fc->version_table || !outver->version_index) { - fi->version_ptr = NULL; - return; - } + if (!fc->version_table || !outver->version_index) + return NULL; + if (outver->version_index >= fc->version_table_size) { pr_warn_ratelimited("version index too large (%llu >= %llu)\n", outver->version_index, fc->version_table_size); - fi->version_ptr = NULL; - return; + return NULL; } - fi->version_ptr = fc->version_table + outver->version_index; + return fc->version_table + outver->version_index; +} - pr_info("fuse: version_ptr = %p\n", fi->version_ptr); - pr_info("fuse: version = %lli\n", fi->attr_version); - pr_info("fuse: current_version: %lli\n", *fi->version_ptr); +static void fuse_set_version_ptr(struct inode *inode, + struct fuse_entryver_out *outver) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + + fi->version_ptr = fuse_version_ptr(inode, outver); + + if (fi->version_ptr) { + pr_info("fuse: version_ptr = %p\n", fi->version_ptr); + pr_info("fuse: version = %lli\n", fi->attr_version); + pr_info("fuse: current_version: %lli\n", *fi->version_ptr); + } } /* @@ -335,13 +342,16 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (!ret && !outarg.nodeid) ret = -ENOENT; if (!ret) { + s64 *new_version_ptr = fuse_version_ptr(inode, &outver); + fi = get_fuse_inode(inode); if (outarg.nodeid != get_node_id(inode)) { fuse_queue_forget(fc, forget, outarg.nodeid, 1); goto invalid; } - if (fi->version_ptr != fc->version_table + outver.version_index) - pr_warn("fuse_dentry_revalidate: version_ptr changed (%p -> %p)\n", fi->version_ptr, fc->version_table + outver.version_index); + if (fi->version_ptr != new_version_ptr) { + pr_warn("fuse_dentry_revalidate: version_ptr changed (%p -> %p)\n", fi->version_ptr, new_version_ptr); + } spin_lock(&fc->lock); fi->nlookup++; From patchwork Mon Dec 10 17:13:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10721881 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8646F15A6 for ; Mon, 10 Dec 2018 17:17:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 680512AF37 for ; Mon, 10 Dec 2018 17:17:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5C8B32AF3F; Mon, 10 Dec 2018 17:17:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 18C532AF37 for ; Mon, 10 Dec 2018 17:17:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728867AbeLJRQi (ORCPT ); Mon, 10 Dec 2018 12:16:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41360 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728415AbeLJRNg (ORCPT ); Mon, 10 Dec 2018 12:13:36 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 49A5B3082E5F; Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE320605C5; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id DCCA1224281; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 52/52] fuse: fix fuse_permission() for the default_permissions case Date: Mon, 10 Dec 2018 12:13:18 -0500 Message-Id: <20181210171318.16998-53-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Mon, 10 Dec 2018 17:13:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Fixes: f064cab7f6ee ("fuse: add shared version support (virtio-fs only)") Signed-off-by: Miklos Szeredi --- fs/fuse/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index f9a91e782cf0..f1da787796e8 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1324,7 +1324,7 @@ static int fuse_permission(struct inode *inode, int mask) if (fc->default_permissions) { err = -EACCES; - if (!refreshed && !fuse_shared_version_mismatch(inode)) + if (refreshed || !fuse_shared_version_mismatch(inode)) err = generic_permission(inode, mask); /* If permission is denied, try to refresh file