From patchwork Thu Sep 15 13:17:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wouter Verhelst X-Patchwork-Id: 9333611 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A4F8C6077A for ; Thu, 15 Sep 2016 13:18:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96F99296F5 for ; Thu, 15 Sep 2016 13:18:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B70D29760; Thu, 15 Sep 2016 13:18:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98412296F5 for ; Thu, 15 Sep 2016 13:18:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751334AbcIONSJ (ORCPT ); Thu, 15 Sep 2016 09:18:09 -0400 Received: from latin.grep.be ([46.4.76.168]:60454 "EHLO latin.grep.be" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751166AbcIONSJ (ORCPT ); Thu, 15 Sep 2016 09:18:09 -0400 Received: from d54c66c97.access.telenet.be ([84.198.108.151] helo=gangtai.grep.be) by latin.grep.be with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bkWXz-0004WI-2C; Thu, 15 Sep 2016 15:17:47 +0200 Received: from wouter by gangtai.grep.be with local (Exim 4.87) (envelope-from ) id 1bkWXt-0001Xj-ID; Thu, 15 Sep 2016 15:17:41 +0200 Date: Thu, 15 Sep 2016 15:17:41 +0200 From: Wouter Verhelst To: Alex Bligh Cc: Christoph Hellwig , "nbd-general@lists.sourceforge.net" , Josef Bacik , "linux-kernel@vger.kernel.org" , linux-block@vger.kernel.org, mpa@pengutronix.de, kernel-team@fb.com Subject: Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements Message-ID: <20160915131741.cth6kilmcgnobbuu@grep.be> References: <20160915113807.GA23259@infradead.org> <20160915115514.7hba23nqvvwfhb5z@grep.be> <20160915120125.GA31044@infradead.org> <20160915122120.4h3ykbewaavjk5nx@grep.be> <20160915122304.GA15501@infradead.org> <2ABB8966-5C20-45A7-BB4C-6882F042905D@alex.org.uk> <20160915123646.GA10394@infradead.org> <20160915124103.GA20657@infradead.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Speed: Gates' Law: Every 18 months, the speed of software halves. Organization: none User-Agent: NeoMutt/ (1.7.0) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Sep 15, 2016 at 01:44:29PM +0100, Alex Bligh wrote: > > > On 15 Sep 2016, at 13:41, Christoph Hellwig wrote: > > > > On Thu, Sep 15, 2016 at 01:39:11PM +0100, Alex Bligh wrote: > >> That's probably right in the case of file-based back ends that > >> are running on a Linux OS. But gonbdserver for instance supports > >> (e.g.) Ceph based backends, where each connection might be talking > >> to a completely separate ceph node, and there may be no cache > >> consistency between connections. > > > > Yes, if you don't have a cache coherent backend you are generally > > screwed with a multiqueue protocol. > > I wonder if the ability to support multiqueue should be visible > in the negotiation stage. That would allow the client to refuse > to select multiqueue where it isn't safe. The server can always refuse to allow multiple connections. I was thinking of changing the spec as follows: The latter bit (on the client side) is because even if your backend has no cache coherency issues, TCP does not guarantee ordering between multiple connections. I don't know if the above is in line with what blk-mq does, but consider the following scenario: - A client sends two writes to the server, followed (immediately) by a flush, where at least the second write and the flush are not sent over the same connection. - The first write is a small one, and it is handled almost immediately. - The second write takes a little longer, so the flush is handled earlier than the second write - The network packet containing the flush reply gets lost for whatever reason, so the client doesn't get it, and we fall into TCP retransmits. - The second write finishes, and its reply header does not get lost - After the second write reply reaches the client, the TCP retransmits for the flush reply are handled. In the above scenario, the flush reply arrives on the client side after a write reply which it did not cover; so the client will (incorrectly) assume that the write has reached permanent storage when in fact it may not have done so yet. If the kernel does not care about the ordering of the two writes versus the flush, then there is no problem. I don't know how blk-mq works in that context, but if the above is a likely scenario, we may have to reconsider adding blk-mq to nbd. diff --git a/doc/proto.md b/doc/proto.md index 217f57e..cb099e2 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -308,6 +308,23 @@ specification, the [kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt) may be useful. +For performance reasons, clients MAY open multiple connections to the +same server. To support such clients, servers SHOULD ensure that at +least one of the following conditions hold: + +* Flush commands are processed for ALL connections. That is, when an + `NBD_CMD_WRITE` is processed on one connection, and then an + `NBD_CMD_FLUSH` is processed on another connection, the data of the + `NBD_CMD_WRITE` on the first connection MUST reach permanent storage + before the reply of the `NBD_CMD_FLUSH` is sent. +* The server allows `NBD_CMD_WRITE` and `NBD_CMD_FLUSH` on at most one + connection +* Multiple connections are not allowed + +In addition, clients using multiple connections SHOULD NOT send +`NBD_CMD_FLUSH` if an `NBD_CMD_WRITE` for which they care in relation to +the flush has not been replied to yet. + #### Request message The request message, sent by the client, looks as follows: