From patchwork Thu Mar 14 10:46:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kirill Smelkov X-Patchwork-Id: 10852621 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC9A61669 for ; Thu, 14 Mar 2019 11:16:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D74B32A2D2 for ; Thu, 14 Mar 2019 11:16:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CABA32A2D7; Thu, 14 Mar 2019 11:16:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,URIBL_GREY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F1332A2D2 for ; Thu, 14 Mar 2019 11:16:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727427AbfCNLP7 (ORCPT ); Thu, 14 Mar 2019 07:15:59 -0400 Received: from mail134-29.atl141.mandrillapp.com ([198.2.134.29]:41324 "EHLO mail134-29.atl141.mandrillapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726693AbfCNLP7 (ORCPT ); Thu, 14 Mar 2019 07:15:59 -0400 X-Greylist: delayed 902 seconds by postgrey-1.27 at vger.kernel.org; Thu, 14 Mar 2019 07:15:58 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=mandrill; d=nexedi.com; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Date:MIME-Version:Content-Type:Content-Transfer-Encoding; i=kirr@nexedi.com; bh=Bw1lQeSG3+U3sfkd+vpTiC41THnP3fAlNxW5j8yTtTs=; b=UJYstBRJPiQ7A5rShQH3bQQxjrRNG4xMxXbPMX1ZasNtoIpiTynOKGVfF9IeivMHICCm3Fc9W/lu HJUJKhL0cCdze52knVj55NwFLtnG2+HfMBwctrH3H0D2jwYgFqw9PW61Qo6F64Oq8JcT/Lch49n1 XGOxvzjCMorC+abVVXc= Received: from pmta03.mandrill.prod.atl01.rsglab.com (127.0.0.1) by mail134-29.atl141.mandrillapp.com id hh8rus1sau8g for ; Thu, 14 Mar 2019 10:46:07 +0000 (envelope-from ) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; i=@mandrillapp.com; q=dns/txt; s=mandrill; t=1552560367; h=From : Subject : To : Cc : Message-Id : In-Reply-To : References : Date : MIME-Version : Content-Type : Content-Transfer-Encoding : From : Subject : Date : X-Mandrill-User : List-Unsubscribe; bh=Bw1lQeSG3+U3sfkd+vpTiC41THnP3fAlNxW5j8yTtTs=; b=m4q52dOTiT3tOYTg7J7bYk0q5qutS/uPOhp3v3iCRXf/b2/p08ExkHOUlOy5vOAr0eoTQ0 yAFt72689VVQ7NEf49+TvCgp+T64r9WWHZ3XzZkLTEXqRseMoDf4He2eJJBCAMNXfqNWQ+LS qVM21q8oPcLhN5GLsKv8FfdxbJ5qM= From: Kirill Smelkov Subject: [PATCH 1/2] fuse: retrieve: cap requested size to negotiated max_write Received: from [87.98.221.171] by mandrillapp.com id 46eace97bf3243e48a10d7c9e969468f; Thu, 14 Mar 2019 10:46:07 +0000 X-Mailer: git-send-email 2.21.0.225.g810b269d1a To: Miklos Szeredi , Miklos Szeredi Cc: , , Kirill Smelkov , Han-Wen Nienhuys , Jakob Unterwurzacher , Message-Id: <12f7d0d98555ee0d174d04bb47644f65c07f035a.1552558717.git.kirr@nexedi.com> In-Reply-To: References: X-Report-Abuse: Please forward a copy of this message, including all headers, to abuse@mandrill.com X-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=31050260.46eace97bf3243e48a10d7c9e969468f X-Mandrill-User: md_31050260 Date: Thu, 14 Mar 2019 10:46:07 +0000 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP FUSE filesystem server and kernel client negotiate during initialization phase, what should be the maximum write size the client will ever issue. Correspondingly the filesystem server then queues sys_read calls to read requests with buffer capacity large enough to carry request header + that max_write bytes. A filesystem server is free to set its max_write in anywhere in the range between [1·page, fc->max_pages·page]. In particular go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages corresponds to 128K. Libfuse also allows users to configure max_write, but by default presets it to possible maximum. If max_write is < fc->max_pages·page, and in NOTIFY_RETRIEVE handler we allow to retrieve more than max_write bytes, corresponding prepared NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the filesystem server, in full correspondence with server/client contract, will be only queuing sys_read with ~max_write buffer capacity, and fuse_dev_do_read throws away requests that cannot fit into server request buffer. In turn the filesystem server could get stuck waiting indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is understood by clients as that NOTIFY_REPLY was queued and will be sent back. -> Cap requested size to negotiate max_write to avoid the problem. This aligns with the way NOTIFY_RETRIEVE handler works, which already unconditionally caps requested retrieve size to fuse_conn->max_pages. This way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than was originally requested. Please see [1] for context where the problem of stuck filesystem was hit for real, how the situation was traced and for more involving patch that did not make it into the tree. [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 [2] https://github.com/hanwen/go-fuse Signed-off-by: Kirill Smelkov Cc: Han-Wen Nienhuys Cc: Jakob Unterwurzacher Cc: # v2.6.36+ --- fs/fuse/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 8a63e52785e9..38e94bc43053 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1749,7 +1749,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode, offset = outarg->offset & ~PAGE_MASK; file_size = i_size_read(inode); - num = outarg->size; + num = min(outarg->size, fc->max_write); if (outarg->offset > file_size) num = 0; else if (outarg->offset + num > file_size) From patchwork Thu Mar 14 10:46:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Smelkov X-Patchwork-Id: 10852623 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 413BF1575 for ; Thu, 14 Mar 2019 11:16:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 277432A2D2 for ; Thu, 14 Mar 2019 11:16:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1AB2B2A2D7; Thu, 14 Mar 2019 11:16:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,URIBL_GREY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE1B62A2D5 for ; Thu, 14 Mar 2019 11:16:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727433AbfCNLQB (ORCPT ); Thu, 14 Mar 2019 07:16:01 -0400 Received: from mail134-29.atl141.mandrillapp.com ([198.2.134.29]:41324 "EHLO mail134-29.atl141.mandrillapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727403AbfCNLQA (ORCPT ); Thu, 14 Mar 2019 07:16:00 -0400 X-Greylist: delayed 903 seconds by postgrey-1.27 at vger.kernel.org; Thu, 14 Mar 2019 07:15:59 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=mandrill; d=nexedi.com; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Date:MIME-Version:Content-Type:Content-Transfer-Encoding; i=kirr@nexedi.com; bh=aZYJRdqvQw9SXtRAss2DpC96cHdsQFTZmWzeO8ValpI=; b=C8uLD4cKmb5l8Gu06wFlVGTBaMO2r/0VghR4zqllI4g3KYJsQ5Iih5GyMyoSuHo5XNHYXEV5W3VO hu+zKrzcgqM8wKafkrcRJBp5Tc6mcNAaISKpNqnjmK6czthZd2wwMktLgBGph23l23Ux0DTeyfLz duXZ4/21NBEc9tKb2Xs= Received: from pmta03.mandrill.prod.atl01.rsglab.com (127.0.0.1) by mail134-29.atl141.mandrillapp.com id hh8ruu1sau8i for ; Thu, 14 Mar 2019 10:46:13 +0000 (envelope-from ) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; i=@mandrillapp.com; q=dns/txt; s=mandrill; t=1552560373; h=From : Subject : To : Cc : Message-Id : In-Reply-To : References : Date : MIME-Version : Content-Type : Content-Transfer-Encoding : From : Subject : Date : X-Mandrill-User : List-Unsubscribe; bh=aZYJRdqvQw9SXtRAss2DpC96cHdsQFTZmWzeO8ValpI=; b=Z5jpoSnzPfKsIfvk1ItuIkNTGUrh2dWwA5qeUxqGPcEgeLW+ipkIbrUFovrjaBcAnItizE Q1RaIYD3O8Y6V+JmzO4Speg3D1zlBCeg9+14CetsAS1TCOo6HH2QW6mA10sYB0O8ZzrHSX3Q UEwwen5I4v9PCkOic+g00AX8XGGcI= From: Kirill Smelkov Subject: [PATCH 2/2] fuse: require /dev/fuse reads to have enough buffer capacity as negotiated Received: from [87.98.221.171] by mandrillapp.com id 47f28f01089a45b187560d8155dc01a0; Thu, 14 Mar 2019 10:46:13 +0000 X-Mailer: git-send-email 2.21.0.225.g810b269d1a To: Miklos Szeredi , Miklos Szeredi Cc: , , Kirill Smelkov , Han-Wen Nienhuys , Jakob Unterwurzacher Message-Id: In-Reply-To: References: X-Report-Abuse: Please forward a copy of this message, including all headers, to abuse@mandrill.com X-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=31050260.47f28f01089a45b187560d8155dc01a0 X-Mandrill-User: md_31050260 Date: Thu, 14 Mar 2019 10:46:13 +0000 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A FUSE filesystem server queues /dev/fuse sys_read calls to get filesystem requests to handle. It does not know in advance what would be that request as it can be anything that client issues - LOOKUP, READ, WRITE, ... Many requests are short and retrieve data from the filesystem. However WRITE and NOTIFY_REPLY write data into filesystem. Before getting into operation phase, FUSE filesystem server and kernel client negotiate what should be the maximum write size the client will ever issue. After negotiation the contract in between server/client is that the filesystem server then should queue /dev/fuse sys_read calls with enough buffer capacity to receive any client request - WRITE in particular, while FUSE client should not, in particular, send WRITE requests with > negotiated max_write payload. FUSE client in kernel and libfuse historically reserve 4K for request header. This way the contract is that filesystem server should queue sys_reads with 4K+max_write buffer. If the filesystem server does not follow this contract, what can happen is that fuse_dev_do_read will see that request size is > buffer size, and then it will return EIO to client who issued the request but won't indicate in any way that there is a problem to filesystem server. This can be hard to diagnose because for some requests, e.g. for NOTIFY_REPLY which mimics WRITE, there is no client thread that is waiting for request completion and that EIO goes nowhere, while on filesystem server side things look like the kernel is not replying back after successful NOTIFY_RETRIEVE request made by the server. -> We can make the problem easy to diagnose if we indicate via error return to filesystem server when it is violating the contract. This should not practically cause problems because if a filesystem server is using shorter buffer, writes to it were already very likely to cause EIO, and if the filesystem is read-only it should be too following 8K minimum buffer size (= either FUSE_MIN_READ_BUFFER, see 1d3d752b47, or = 4K + min(max_write)=4k cared to be so by process_init_reply). Please see [1] for context where the problem of stuck filesystem was hit for real (because kernel client was incorrectly sending more than max_write data with NOTIFY_REPLY; see also previous patch), how the situation was traced and for more involving patch that did not make it into the tree. [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 Signed-off-by: Kirill Smelkov Cc: Han-Wen Nienhuys Cc: Jakob Unterwurzacher --- fs/fuse/dev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 38e94bc43053..8fdfbafed037 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1317,6 +1317,16 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, unsigned reqsize; unsigned int hash; + /* + * Require sane minimum read buffer - that has capacity for fixed part + * of any request header + negotated max_write room for data. If the + * requirement is not satisfied return EINVAL to the filesystem server + * to indicate that it is not following FUSE server/client contract. + * Don't dequeue / abort any request. + */ + if (nbytes < (fc->conn_init ? 4096 + fc->max_write : FUSE_MIN_READ_BUFFER)) + return -EINVAL; + restart: spin_lock(&fiq->waitq.lock); err = -EAGAIN;