From patchwork Tue Apr 19 12:07:30 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Cody X-Patchwork-Id: 8879201 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 90B49BF29F for ; Tue, 19 Apr 2016 12:08:14 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F27862026F for ; Tue, 19 Apr 2016 12:08:08 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 21A7C2026D for ; Tue, 19 Apr 2016 12:08:08 +0000 (UTC) Received: from localhost ([::1]:56764 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asURr-0003zZ-D8 for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Apr 2016 08:08:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35163) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asURW-0003tX-8t for qemu-devel@nongnu.org; Tue, 19 Apr 2016 08:07:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1asURS-0003B7-Ri for qemu-devel@nongnu.org; Tue, 19 Apr 2016 08:07:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42022) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asURQ-00036m-7c; Tue, 19 Apr 2016 08:07:40 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EE7EB6035; Tue, 19 Apr 2016 12:07:39 +0000 (UTC) Received: from localhost (ovpn-112-74.phx2.redhat.com [10.3.112.74]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u3JC7c9W021719 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA256 bits=256 verify=NO); Tue, 19 Apr 2016 08:07:39 -0400 From: Jeff Cody To: qemu-block@nongnu.org Date: Tue, 19 Apr 2016 08:07:30 -0400 Message-Id: <3d0edbc7aafd7731496db768c55a8ff3d4ac1537.1461067248.git.jcody@redhat.com> In-Reply-To: References: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 19 Apr 2016 12:07:39 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, rgowdapp@redhat.com, pkarampu@redhat.com, qemu-devel@nongnu.org, rwheeler@redhat.com, ndevos@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Upon receiving an I/O error after an fsync, by default gluster will dump its cache. However, QEMU will retry the fsync, which is especially useful when encountering errors such as ENOSPC when using the werror=stop option. When using caching with gluster, however, the last written data will be lost upon encountering ENOSPC. Using the write-behind-cache xlator option of 'resync-failed-syncs-after-fsync' should cause gluster to retain the cached data after a failed fsync, so that ENOSPC and other transient errors are recoverable. Unfortunately, we have no way of knowing if the 'resync-failed-syncs-after-fsync' xlator option is supported, so for now close the fd and set the BDS driver to NULL upon fsync error. Signed-off-by: Jeff Cody --- block/gluster.c | 42 ++++++++++++++++++++++++++++++++++++++++++ configure | 8 ++++++++ 2 files changed, 50 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index d9aace6..ba33488 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -314,6 +314,23 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options, goto out; } +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT + /* Without this, if fsync fails for a recoverable reason (for instance, + * ENOSPC), gluster will dump its cache, preventing retries. This means + * almost certain data loss. Not all gluster versions support the + * 'resync-failed-syncs-after-fsync' key value, but there is no way to + * discover during runtime if it is supported (this api returns success for + * unknown key/value pairs) */ + ret = glfs_set_xlator_option(s->glfs, "*-write-behind", + "resync-failed-syncs-after-fsync", + "on"); + if (ret < 0) { + error_setg_errno(errp, errno, "Unable to set xlator key/value pair"); + ret = -errno; + goto out; + } +#endif + qemu_gluster_parse_flags(bdrv_flags, &open_flags); s->fd = glfs_open(s->glfs, gconf->image, open_flags); @@ -366,6 +383,16 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state, goto exit; } +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT + ret = glfs_set_xlator_option(reop_s->glfs, "*-write-behind", + "resync-failed-syncs-after-fsync", "on"); + if (ret < 0) { + error_setg_errno(errp, errno, "Unable to set xlator key/value pair"); + ret = -errno; + goto exit; + } +#endif + reop_s->fd = glfs_open(reop_s->glfs, gconf->image, open_flags); if (reop_s->fd == NULL) { /* reops->glfs will be cleaned up in _abort */ @@ -613,6 +640,21 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs) ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb); if (ret < 0) { + /* Some versions of Gluster (3.5.6 -> 3.5.8?) will not retain its + * cache after a fsync failure, so we have no way of allowing the guest + * to safely continue. Gluster versions prior to 3.5.6 don't retain + * the cache either, but will invalidate the fd on error, so this is + * again our only option. + * + * The 'resync-failed-syncs-after-fsync' xlator option for the + * write-behind cache will cause later gluster versions to retain + * its cache after error, so long as the fd remains open. However, + * we currently have no way of knowing if this option is supported. + * + * TODO: Once gluster provides a way for us to determine if the option + * is supported, bypass the closure and setting drv to NULL. */ + qemu_gluster_close(bs); + bs->drv = NULL; return -errno; } diff --git a/configure b/configure index f1c307b..ab54f3c 100755 --- a/configure +++ b/configure @@ -298,6 +298,7 @@ coroutine="" coroutine_pool="" seccomp="" glusterfs="" +glusterfs_xlator_opt="no" glusterfs_discard="no" glusterfs_zerofill="no" archipelago="no" @@ -3400,6 +3401,9 @@ if test "$glusterfs" != "no" ; then glusterfs="yes" glusterfs_cflags=`$pkg_config --cflags glusterfs-api` glusterfs_libs=`$pkg_config --libs glusterfs-api` + if $pkg_config --atleast-version=4 glusterfs-api; then + glusterfs_xlator_opt="yes" + fi if $pkg_config --atleast-version=5 glusterfs-api; then glusterfs_discard="yes" fi @@ -5342,6 +5346,10 @@ if test "$glusterfs" = "yes" ; then echo "GLUSTERFS_LIBS=$glusterfs_libs" >> $config_host_mak fi +if test "$glusterfs_xlator_opt" = "yes" ; then + echo "CONFIG_GLUSTERFS_XLATOR_OPT=y" >> $config_host_mak +fi + if test "$glusterfs_discard" = "yes" ; then echo "CONFIG_GLUSTERFS_DISCARD=y" >> $config_host_mak fi