From patchwork Wed Sep 16 16:17:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11780941 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 663C592C for ; Wed, 16 Sep 2020 20:56:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3EEDB21D7D for ; Wed, 16 Sep 2020 20:56:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Lxp2T0bD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726899AbgIPU4c (ORCPT ); Wed, 16 Sep 2020 16:56:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:34917 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726553AbgIPQwi (ORCPT ); Wed, 16 Sep 2020 12:52:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600275139; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f1O55K6rmGWiDD2mmxn/ros/bBH3lRJCLyuGLJc5wtM=; b=Lxp2T0bDzIktB8zJBKWW6swpnQSUYjGy35FTBOgT2NrxmSqFfcjbrVVUJLnt3zkZhP8ySA Ii5m4mHvQfxvVCbJxrFUIorbjX2sm0LyxZacBoXXE+r5G+h5BNJs+hkz3aqidiuv7hxGQI jSFK6ijGaRYG22mVBFgL/ikKpugNlk8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-230-Wu2SsEQpMROGuwHByGorUg-1; Wed, 16 Sep 2020 12:17:59 -0400 X-MC-Unique: Wu2SsEQpMROGuwHByGorUg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CE35A104D3E5; Wed, 16 Sep 2020 16:17:58 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-139.rdu2.redhat.com [10.10.116.139]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1D5AC75135; Wed, 16 Sep 2020 16:17:55 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 8B653223D06; Wed, 16 Sep 2020 12:17:54 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu Cc: vgoyal@redhat.com, virtio-fs@redhat.com Subject: [PATCH v2 1/6] fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2 Date: Wed, 16 Sep 2020 12:17:32 -0400 Message-Id: <20200916161737.38028-2-vgoyal@redhat.com> In-Reply-To: <20200916161737.38028-1-vgoyal@redhat.com> References: <20200916161737.38028-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We already have FUSE_HANDLE_KILLPRIV flag that says that file server will remove suid/sgid/caps on truncate/chown/write. But that's little different from what Linux VFS implements. To be consistent with Linux VFS behavior what we want is. - caps are always cleared on chown/write/truncate - suid is always cleared on chown, while for truncate/write it is cleared only if caller does not have CAP_FSETID. - sgid is always cleared on chown, while for truncate/write it is cleared only if caller does not have CAP_FSETID as well as file has group execute permission. As previous flag did not provide above semantics. Implement a V2 of the protocol with above said constraints. Server does not know if caller has CAP_FSETID or not. So for the case of write()/truncate(), client will send information in special flag to indicate whether to kill priviliges or not. These changes are in subsequent patches. FUSE_HANDLE_KILLPRIV_V2 relies on WRITE being sent to server to clear suid/sgid/security.capability. But with ->writeback_cache, WRITES are cached in guest. So it is not recommended to use FUSE_HANDLE_KILLPRIV_V2 and writeback_cache together. Though it probably might be good enough for lot of use cases. Signed-off-by: Vivek Goyal --- fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/inode.c | 5 ++++- include/uapi/linux/fuse.h | 7 +++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index dbaae2f6c73e..3dd1578be405 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -631,6 +631,12 @@ struct fuse_conn { /* show legacy mount options */ unsigned int legacy_opts_show:1; + /** fs kills suid/sgid/cap on write/chown/trunc. suid is + killed on write/trunc only if caller did not have CAP_FSETID. + sgid is killed on write/truncate only if caller did not have + CAP_FSETID as well as file has group execute permission. */ + unsigned handle_killpriv_v2:1; + /* * The following bitfields are only for optimization purposes * and hence races in setting them will not cause malfunction diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index d252237219bf..20740b61f12b 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -993,6 +993,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_args *args, !fuse_dax_check_alignment(fc, arg->map_alignment)) { ok = false; } + if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) + fc->handle_killpriv_v2 = 1; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; @@ -1035,7 +1037,8 @@ void fuse_send_init(struct fuse_conn *fc) FUSE_WRITEBACK_CACHE | FUSE_NO_OPEN_SUPPORT | FUSE_PARALLEL_DIROPS | FUSE_HANDLE_KILLPRIV | FUSE_POSIX_ACL | FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS | - FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA; + FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | + FUSE_HANDLE_KILLPRIV_V2; #ifdef CONFIG_FUSE_DAX if (fc->dax) ia->in.flags |= FUSE_MAP_ALIGNMENT; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 8899e4862309..3ae3f222a0ed 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -172,6 +172,7 @@ * - add FUSE_WRITE_KILL_PRIV flag * - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING * - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag + * - add FUSE_HANDLE_KILLPRIV_V2 */ #ifndef _LINUX_FUSE_H @@ -316,6 +317,11 @@ struct fuse_file_lock { * FUSE_MAP_ALIGNMENT: init_out.map_alignment contains log2(byte alignment) for * foffset and moffset fields in struct * fuse_setupmapping_out and fuse_removemapping_one. + * FUSE_HANDLE_KILLPRIV_V2: fs kills suid/sgid/cap on write/chown/trunc. + * Upon write/truncate suid/sgid is only killed if caller + * does not have CAP_FSETID. Additionally upon + * write/truncate sgid is killed only if file has group + * execute permission. (Same as Linux VFS behavior). */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -344,6 +350,7 @@ struct fuse_file_lock { #define FUSE_NO_OPENDIR_SUPPORT (1 << 24) #define FUSE_EXPLICIT_INVAL_DATA (1 << 25) #define FUSE_MAP_ALIGNMENT (1 << 26) +#define FUSE_HANDLE_KILLPRIV_V2 (1 << 27) /** * CUSE INIT request/reply flags From patchwork Wed Sep 16 16:17:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11780905 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06E9259D for ; Wed, 16 Sep 2020 20:50:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D142021D24 for ; Wed, 16 Sep 2020 20:50:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Lad5W2Gy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728536AbgIPUui (ORCPT ); Wed, 16 Sep 2020 16:50:38 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:37979 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726634AbgIPQzM (ORCPT ); Wed, 16 Sep 2020 12:55:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600275310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kerhCoVWsEyNRAW3z9PF5+F+5l2ckGp7jqAJsTLRVXQ=; b=Lad5W2GysJwljsT95sBaUmDkVrSlG5vg6GP7JLnLx7HhpNWn9ZsgEKaAQKYHo7n7hjdSLf /9TiRh3gdVuVYMp22jivI/D8xTO12YNVoPOWlIbhk9Xd5y0vfTN0nVOcRsO3Y1PUPIsn70 3E3ZJAe0IaUSI8yK/+vPZOvQf0VzT6I= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-452-G8CNpshqOfGTRsCVBgw5jw-1; Wed, 16 Sep 2020 12:17:59 -0400 X-MC-Unique: G8CNpshqOfGTRsCVBgw5jw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9A6BF1891E80; Wed, 16 Sep 2020 16:17:58 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-139.rdu2.redhat.com [10.10.116.139]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A09976E16; Wed, 16 Sep 2020 16:17:55 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 8DBAF223D08; Wed, 16 Sep 2020 12:17:54 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu Cc: vgoyal@redhat.com, virtio-fs@redhat.com Subject: [PATCH v2 2/6] fuse: Set FUSE_WRITE_KILL_PRIV in cached write path Date: Wed, 16 Sep 2020 12:17:33 -0400 Message-Id: <20200916161737.38028-3-vgoyal@redhat.com> In-Reply-To: <20200916161737.38028-1-vgoyal@redhat.com> References: <20200916161737.38028-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org With HANDLE_KILLPRIV_V2, server will need to kill suid/sgid if caller does not have CAP_FSETID. We already have a flag FUSE_WRITE_KILL_PRIV in WRITE request and we already set it in direct I/O path. To make it work in cached write path also, start setting FUSE_WRITE_KILL_PRIV in this path too. Set it only if fc->handle_killpriv_v2 is set. Otherwise client is responsible for kill suid/sgid. In case of direct I/O we set FUSE_WRITE_KILL_PRIV unconditionally because we do't call file_remove_privs() in that path (with cache=none option). Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 172a0b1aa634..e40428f3d0f1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1095,6 +1095,8 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, fuse_write_args_fill(ia, ff, pos, count); ia->write.in.flags = fuse_write_flags(iocb); + if (fc->handle_killpriv_v2 && !capable(CAP_FSETID)) + ia->write.in.write_flags |= FUSE_WRITE_KILL_PRIV; err = fuse_simple_request(fc, &ap->args); if (!err && ia->write.out.size > count) From patchwork Wed Sep 16 16:17:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11780907 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84B5592C for ; Wed, 16 Sep 2020 20:50:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 57101221EF for ; Wed, 16 Sep 2020 20:50:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g2eBIZVK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728529AbgIPUug (ORCPT ); Wed, 16 Sep 2020 16:50:36 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:25221 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726635AbgIPQzM (ORCPT ); Wed, 16 Sep 2020 12:55:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600275310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gc3HE0beuXKXmjgEJUYd7yqRfe8GH+88a3rPpHc5o9g=; b=g2eBIZVKep33mcrgI+wa0yZZUw/i1IPYkerxYRqmWqEshOezLnGLs2hdvuvIVFuvYZdF8t EVDKQ2sJ2115uKFBlM0JZ5wJuR2atBOjw29+clG2oOxG41w//3VFQpNfWKJn+FreU5CrRN i8koWVbgWy82MHCFi0FVmyV/gVvvykE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-220-wXXaQ4q2MRqVBLrDCfbOBA-1; Wed, 16 Sep 2020 12:17:59 -0400 X-MC-Unique: wXXaQ4q2MRqVBLrDCfbOBA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B43F88010C7; Wed, 16 Sep 2020 16:17:58 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-139.rdu2.redhat.com [10.10.116.139]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1A984610F3; Wed, 16 Sep 2020 16:17:55 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 92FBC223D09; Wed, 16 Sep 2020 12:17:54 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu Cc: vgoyal@redhat.com, virtio-fs@redhat.com Subject: [PATCH v2 3/6] fuse: setattr should set FATTR_KILL_PRIV upon size change Date: Wed, 16 Sep 2020 12:17:34 -0400 Message-Id: <20200916161737.38028-4-vgoyal@redhat.com> In-Reply-To: <20200916161737.38028-1-vgoyal@redhat.com> References: <20200916161737.38028-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If fc->handle_killpriv_v2 is enabled, we expect file server to clear suid/sgid/security.capbility upon chown/truncate/write as appropriate. Upon truncate (ATTR_SIZE), suid/sgid is cleared only if caller does not have CAP_FSETID. File server does not know whether caller has CAP_FSETID or not. Hence set FATTR_KILL_PRIV upon truncate to let file server know that caller does not have CAP_FSETID and it should kill suid/sgid as appropriate. We don't have to send this information for chown (ATTR_UID/ATTR_GID) as that always clears suid/sgid irrespective of capabilities of calling process. Signed-off-by: Vivek Goyal --- fs/fuse/dir.c | 2 ++ include/uapi/linux/fuse.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index c4a01290aec6..ecdb7895c156 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1575,6 +1575,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, /* For mandatory locking in truncate */ inarg.valid |= FATTR_LOCKOWNER; inarg.lock_owner = fuse_lock_owner_id(fc, current->files); + if (fc->handle_killpriv_v2 && !capable(CAP_FSETID)) + inarg.valid |= FATTR_KILL_PRIV; } fuse_setattr_fill(fc, &args, inode, &inarg, &outarg); err = fuse_simple_request(fc, &args); diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 3ae3f222a0ed..7b8da0a2de0d 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -269,6 +269,7 @@ struct fuse_file_lock { #define FATTR_MTIME_NOW (1 << 8) #define FATTR_LOCKOWNER (1 << 9) #define FATTR_CTIME (1 << 10) +#define FATTR_KILL_PRIV (1 << 14) /* Matches ATTR_KILL_PRIV */ /** * Flags returned by the OPEN request From patchwork Wed Sep 16 16:17:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11780953 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57A3D92C for ; Wed, 16 Sep 2020 20:57:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 315D8221EF for ; Wed, 16 Sep 2020 20:57:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dJnPTwhH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726557AbgIPU5B (ORCPT ); Wed, 16 Sep 2020 16:57:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:27560 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726551AbgIPQwi (ORCPT ); Wed, 16 Sep 2020 12:52:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600275136; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Sp0/SuqJJA+dNQKs9AmCAo8k+mDHaLEZwtyll25OTOU=; b=dJnPTwhHLEdKXV7G+Wfo9BO9BpSK/pd8F9hTMMLHRMqlBfLH8w+ypvkqWW2+UpL71QssTE RnMKmirnL/s3feMIiVGISp6ODOUhd/a6/2bB/WnmZ9o7s3rcuKKfs4hzqSI0I9y4x+gi+o 33vnwLtFAKj5Re8fbuoX5Ytqck1oJmo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-500-Znft36czMiC3Kc4wYMf-Ew-1; Wed, 16 Sep 2020 12:17:59 -0400 X-MC-Unique: Znft36czMiC3Kc4wYMf-Ew-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B19491891E83; Wed, 16 Sep 2020 16:17:58 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-139.rdu2.redhat.com [10.10.116.139]) by smtp.corp.redhat.com (Postfix) with ESMTP id 139A67880E; Wed, 16 Sep 2020 16:17:55 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9651A223D0A; Wed, 16 Sep 2020 12:17:54 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu Cc: vgoyal@redhat.com, virtio-fs@redhat.com Subject: [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate Date: Wed, 16 Sep 2020 12:17:35 -0400 Message-Id: <20200916161737.38028-5-vgoyal@redhat.com> In-Reply-To: <20200916161737.38028-1-vgoyal@redhat.com> References: <20200916161737.38028-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If a truncate is happening with ->handle_killpriv_v2 is enabled, then we don't have to send ATTR_MODE to kill suid/sgid as server will kill it as part of the protocol. But if this is non-truncate setattr then server will not kill suid/sgid. So continue to send ATTR_MODE to kill suid/sgid for non-truncate setattr, even if ->handle_killpriv_v2 is enabled. This path is taken when client does a write on a file which has suid/ sgid is set. VFS will first kill suid/sgid and then proceed with WRITE. One can argue that why not simply ignore ATTR_MODE because a WRITE will follow and ->handle_killpriv_v2 will kill suid/sgid that time. I feel this is a safer approach for following reasons. - With ->writeback_cache enabled, WRITE will not go to server. I feel that for this reason ->writeback_cache mode is not fully compatible with ->handle_killpriv_v2. But if we kill suid/sgid now, this will solve this particular issue for ->writeback_cache mode too. Again, I will not solve all the issues around ->writeback_cache but makes things better. - If we rely on WRITE killing suid/sgid, then after cache becomes out of sync w.r.t host. Client will still have suid/sgid set but subsequent WRITE will clear suid/sgid. Well WRITE will also invalidate client cache so further access to inode->i_mode should result in a ->getattr. Hmm..., for the case of ->writeback_cache, I am kind of inclined to send ATTR_MODE. - We are sending setattr(ATTR_FORCE) anyway (even if we clear ATTR_MODE). So if we are not saving on setattr(), why not kill suid/sgid now. Signed-off-by: Vivek Goyal --- fs/fuse/dir.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index ecdb7895c156..4b0fe0828e36 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1655,6 +1655,21 @@ static int fuse_setattr(struct dentry *entry, struct iattr *attr) return -EACCES; if (attr->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID)) { + bool kill_sugid = true; + bool is_truncate = !!(attr->ia_valid & ATTR_SIZE); + + if (fc->handle_killpriv || + (fc->handle_killpriv_v2 && is_truncate)) { + /* + * If this is truncate and ->handle_killpriv_v2 is + * enabled, we don't have to send ATTR_MODE to + * kill suid/sgid as server will do it anyway as + * part of truncate. But if this is not truncate + * then kill suid/sgid by sending ATTR_MODE. + */ + kill_sugid = false; + } + attr->ia_valid &= ~(ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_MODE); @@ -1664,7 +1679,7 @@ static int fuse_setattr(struct dentry *entry, struct iattr *attr) * * This should be done on write(), truncate() and chown(). */ - if (!fc->handle_killpriv) { + if (kill_sugid) { /* * ia_mode calculation may have used stale i_mode. * Refresh and recalculate. From patchwork Wed Sep 16 16:17:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11780903 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B994159D for ; Wed, 16 Sep 2020 20:50:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 91E26221E5 for ; Wed, 16 Sep 2020 20:50:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BT34tau5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727351AbgIPUue (ORCPT ); Wed, 16 Sep 2020 16:50:34 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:40442 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726636AbgIPQzM (ORCPT ); Wed, 16 Sep 2020 12:55:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600275310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PPVMGnXUbus53BUYRzcYfDom5uZA6HXgKCZ6zjAHQgU=; b=BT34tau5XkxnEDQjK4z7lRoUWVXFyMCuNfH8FanT/3HMOabzclB7yRWTY/0nPzA4qF2FPU b/I0U/6za89NH6JXNc1UsusboGxJzdUPQLMepqxzwXpvEmgBG4NPgc02lGGzYoslQKEY7i gJgNgq7JagEZTzbhs9JnPsTzLpUwNhQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-250-_7xOA8OyMOWh8otskA0xPA-1; Wed, 16 Sep 2020 12:18:01 -0400 X-MC-Unique: _7xOA8OyMOWh8otskA0xPA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3CB0D802B6C; Wed, 16 Sep 2020 16:17:59 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-139.rdu2.redhat.com [10.10.116.139]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1EEF0614F5; Wed, 16 Sep 2020 16:17:59 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9A15E223D0B; Wed, 16 Sep 2020 12:17:54 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu Cc: vgoyal@redhat.com, virtio-fs@redhat.com Subject: [PATCH v2 5/6] fuse: Add a flag FUSE_OPEN_KILL_PRIV for open() request Date: Wed, 16 Sep 2020 12:17:36 -0400 Message-Id: <20200916161737.38028-6-vgoyal@redhat.com> In-Reply-To: <20200916161737.38028-1-vgoyal@redhat.com> References: <20200916161737.38028-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org With FUSE_HANDLE_KILLPRIV_V2 support, server will need to kill suid/sgid/security.capability on open(O_TRUNC), if server supports FUSE_ATOMIC_O_TRUNC. But server needs to kill suid/sgid only if caller does not have CAP_FSETID. Given server does not have this information, client needs to send this info to server. So add a flag FUSE_OPEN_KILL_PRIV to fuse_open_in request which tells server to kill suid/sgid(only if group execute is set). Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 5 +++++ include/uapi/linux/fuse.h | 10 +++++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index e40428f3d0f1..2853f55fd8f7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -42,6 +42,11 @@ static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file, inarg.flags = file->f_flags & ~(O_CREAT | O_EXCL | O_NOCTTY); if (!fc->atomic_o_trunc) inarg.flags &= ~O_TRUNC; + + if (fc->handle_killpriv_v2 && (inarg.flags & O_TRUNC) && + !capable(CAP_FSETID)) + inarg.open_flags |= FUSE_OPEN_KILL_PRIV; + args.opcode = opcode; args.nodeid = nodeid; args.in_numargs = 1; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 7b8da0a2de0d..e20b3ee9d292 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -173,6 +173,7 @@ * - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING * - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag * - add FUSE_HANDLE_KILLPRIV_V2 + * - add FUSE_OPEN_KILL_PRIV */ #ifndef _LINUX_FUSE_H @@ -427,6 +428,13 @@ struct fuse_file_lock { */ #define FUSE_FSYNC_FDATASYNC (1 << 0) +/** + * Open flags + * FUSE_OPEN_KILL_PRIV: Kill suid/sgid/security.capability. sgid is cleared + * only if file has group execute permission. + */ +#define FUSE_OPEN_KILL_PRIV (1 << 0) + enum fuse_opcode { FUSE_LOOKUP = 1, FUSE_FORGET = 2, /* no reply */ @@ -588,7 +596,7 @@ struct fuse_setattr_in { struct fuse_open_in { uint32_t flags; - uint32_t unused; + uint32_t open_flags; }; struct fuse_create_in { From patchwork Wed Sep 16 16:17:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11780091 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C335392C for ; Wed, 16 Sep 2020 16:22:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9EAA822403 for ; Wed, 16 Sep 2020 16:22:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FmfR2z52" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726265AbgIPQT4 (ORCPT ); Wed, 16 Sep 2020 12:19:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:33404 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726564AbgIPQTH (ORCPT ); Wed, 16 Sep 2020 12:19:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600273089; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LcoJJWm6cQJGZeJL7rVr9Mya2yS3vWJSHHyzVQL03Hs=; b=FmfR2z52SCvMk2JJ6O1ya0lJHOFqJP9MTUrdcnSnoatTXrk7NTVa/JDsDJJ8FTkGdf1uMx jlwymutFRqkVTgjsHoVOwMbFA85pw/nXrkzs2DI6z/V0B8my9UVX+ka6Ti7xOEQAo+WHCL oxmEzdviIMxvpNWLGEUz9zM3iyn6mZ8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-585-baOYyzOjNueWzgN8pw5D7Q-1; Wed, 16 Sep 2020 12:18:03 -0400 X-MC-Unique: baOYyzOjNueWzgN8pw5D7Q-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 939DA802B6C; Wed, 16 Sep 2020 16:18:02 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-139.rdu2.redhat.com [10.10.116.139]) by smtp.corp.redhat.com (Postfix) with ESMTP id 240CF68D60; Wed, 16 Sep 2020 16:17:59 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9D7DF223D0D; Wed, 16 Sep 2020 12:17:54 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu Cc: vgoyal@redhat.com, virtio-fs@redhat.com Subject: [PATCH v2 6/6] virtiofs: Support SB_NOSEC flag to improve direct write performance Date: Wed, 16 Sep 2020 12:17:37 -0400 Message-Id: <20200916161737.38028-7-vgoyal@redhat.com> In-Reply-To: <20200916161737.38028-1-vgoyal@redhat.com> References: <20200916161737.38028-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org virtiofs can be slow with small writes if xattr are enabled and we are doing cached writes (No direct I/). Ganesh Mahalingam noticed this here. https://github.com/kata-containers/runtime/issues/2815 Some debugging showed that that file_remove_privs() is called in cached write path on every write. And everytime it calls security_inode_need_killpriv() which results in call to __vfs_getxattr(XATTR_NAME_CAPS). And this goes to file server to fetch xattr. This extra round trip for every write slows down writes tremendously. Normally to avoid paying this penalty on every write, vfs has the notion of caching this information in inode (S_NOSEC). So vfs sets S_NOSEC, if filesystem opted for it using super block flag SB_NOSEC. And S_NOSEC is cleared when setuid/setgid bit is set or when security xattr is set on inode so that next time a write happens, we check inode again for clearing setuid/setgid bits as well clear any security.capability xattr. This seems to work well for local file systems but for remote file systems it is possible that VFS does not have full picture and a different client sets setuid/setgid bit or security.capability xattr on file and that means VFS information about S_NOSEC on another client will be stale. So for remote filesystems SB_NOSEC was disabled by default. commit 9e1f1de02c2275d7172e18dc4e7c2065777611bf Author: Al Viro Date: Fri Jun 3 18:24:58 2011 -0400 more conservative S_NOSEC handling That commit mentioned that these filesystems can still make use of SB_NOSEC as long as they clear S_NOSEC when they are refreshing inode attriutes from server. So this patch tries to enable SB_NOSEC on fuse (regular fuse as well as virtiofs). And clear SB_NOSEC when we are refreshing inode attributes. This is enabled only if server supports FUSE_HANDLE_KILLPRIV_V2. This says that server will clear setuid/setgid/security.capability on chown/truncate/write as apporpriate. This should provide tighter coherency because now suid/sgid/security.capability will be cleared even if fuse client cache has not seen these attrs. Basic idea is that fuse client will trigger suid/sgid/security.capability clearing based on its attr cache. But even if cache has gone stale, it is fine because FUSE_HANDLE_KILLPRIV_V2 will make sure WRITE clear suid/sgid/security.capability. We make this change only if server supports FUSE_HANDLE_KILLPRIV_V2. This should make sure that existing filesystems which might be relying on seucurity.capability always being queried from server are not impacted. This tighter coherency relies on WRITE showing up on server (and not being cached in guest). So writeback_cache mode will not provide that tight coherency and it is not recommended to use two together. Having said that it might work reasonably well for lot of use cases. This change improves random write performance very significantly. I am running virtiofsd with cache=auto and following fio command. fio --ioengine=libaio --direct=1 --name=test --filename=/mnt/virtiofs/random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randwrite Before this patch I get around 50MB/s and after the patch I get around 250MB/s bandwidth. So improvement is very significant. Reported-by: "Mahalingam, Ganesh" Signed-off-by: Vivek Goyal --- fs/fuse/inode.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 20740b61f12b..4b7a043f21ee 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -201,6 +201,16 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, inode->i_mode &= ~S_ISVTX; fi->orig_ino = attr->ino; + + /* + * We are refreshing inode data and it is possible that another + * client set suid/sgid or security.capability xattr. So clear + * S_NOSEC. Ideally, we could have cleared it only if suid/sgid + * was set or if security.capability xattr was set. But we don't + * know if security.capability has been set or not. So clear it + * anyway. Its less efficient but should is safe. + */ + inode->i_flags &= ~S_NOSEC; } void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, @@ -993,8 +1003,10 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_args *args, !fuse_dax_check_alignment(fc, arg->map_alignment)) { ok = false; } - if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) + if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) { fc->handle_killpriv_v2 = 1; + fc->sb->s_flags |= SB_NOSEC; + } } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1;