From patchwork Wed Mar 9 12:30:14 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Niels de Vos X-Patchwork-Id: 8545961 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id D4D04C0553 for ; Wed, 9 Mar 2016 12:31:24 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D8B7E20295 for ; Wed, 9 Mar 2016 12:31:23 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B28AB2027D for ; Wed, 9 Mar 2016 12:31:22 +0000 (UTC) Received: from localhost ([::1]:40993 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1addGs-0006u3-56 for patchwork-qemu-devel@patchwork.kernel.org; Wed, 09 Mar 2016 07:31:22 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34286) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1addGd-0006nS-Hn for qemu-devel@nongnu.org; Wed, 09 Mar 2016 07:31:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1addGY-0003e6-Da for qemu-devel@nongnu.org; Wed, 09 Mar 2016 07:31:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60450) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1addGN-0003cA-6Z; Wed, 09 Mar 2016 07:30:51 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id 6B45564D0A; Wed, 9 Mar 2016 12:30:50 +0000 (UTC) Received: from localhost (unused [10.70.7.60] (may be forged)) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u29CUkrr021612 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 9 Mar 2016 07:30:48 -0500 From: Niels de Vos To: qemu-devel@nongnu.org, qemu-block@nongnu.org Date: Wed, 9 Mar 2016 13:30:14 +0100 Message-Id: <1457526614-15596-1-git-send-email-ndevos@redhat.com> In-Reply-To: <20160308131428.GE18666@ndevos-x240.usersys.redhat.com> References: <20160308131428.GE18666@ndevos-x240.usersys.redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 09 Mar 2016 12:30:50 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Cc: Kevin Wolf , Jeff Cody , Prasanna Kumar Kalever , Niels de Vos Subject: [Qemu-devel] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes it possible to detect sparse areas in files. Signed-off-by: Niels de Vos --- Tested by compiling and running "qemu-img map gluster://..." with a build of the current master branch of glusterfs. Using a Fedora cloud image (in raw format) shows many SEEK procudure calls going back and forth over the network. The output of "qemu map" matches the output when run against the image on the local filesystem. v2 based on feedback from Jeff Cody: - Replace compile time detection by runtime detection - Update return pointer (new argument) for .bdrv_co_get_block_status --- block/gluster.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 186 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 65077a0..b01ab52 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB { typedef struct BDRVGlusterState { struct glfs *glfs; struct glfs_fd *fd; + bool supports_seek_data; } BDRVGlusterState; typedef struct GlusterConf { @@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags) } } +/* + * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it + * should return different values for the start of data and the start of a + * hole. There are three different cases to handle: + * + * - the same position is returned for data/hole (indicates broken gfapi) + * - an error is returned: + * - ENXIO only gets returned if there is valid support on client+server + * - EINVAL is returned when gfapi or the server does not support it + */ +static bool qemu_gluster_test_seek(struct glfs_fd *fd) +{ + off_t start_data, start_hole; + bool supports_seek_data = false; + + start_data = glfs_lseek(fd, 0, SEEK_DATA); + if (start_data != -1) { + start_hole = glfs_lseek(fd, 0, SEEK_HOLE); + if (start_hole != -1) + supports_seek_data = !(start_data == start_hole); + } else if (errno == ENXIO) { + supports_seek_data = true; + } + + return supports_seek_data; +} + static int qemu_gluster_open(BlockDriverState *bs, QDict *options, int bdrv_flags, Error **errp) { @@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options, ret = -errno; } + s->supports_seek_data = qemu_gluster_test_seek(s->fd); + out: qemu_opts_del(opts); qemu_gluster_gconf_free(gconf); @@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs) return 0; } +/* + * Find allocation range in @bs around offset @start. + * May change underlying file descriptor's file offset. + * If @start is not in a hole, store @start in @data, and the + * beginning of the next hole in @hole, and return 0. + * If @start is in a non-trailing hole, store @start in @hole and the + * beginning of the next non-hole in @data, and return 0. + * If @start is in a trailing hole or beyond EOF, return -ENXIO. + * If we can't find out, return a negative errno other than -ENXIO. + * + * (Shamefully copied from raw-posix.c, only miniscule adaptions.) + */ +static int find_allocation(BlockDriverState *bs, off_t start, + off_t *data, off_t *hole) +{ + BDRVGlusterState *s = bs->opaque; + off_t offs; + + if (!s->supports_seek_data) + return -EINVAL; + + /* + * SEEK_DATA cases: + * D1. offs == start: start is in data + * D2. offs > start: start is in a hole, next data at offs + * D3. offs < 0, errno = ENXIO: either start is in a trailing hole + * or start is beyond EOF + * If the latter happens, the file has been truncated behind + * our back since we opened it. All bets are off then. + * Treating like a trailing hole is simplest. + * D4. offs < 0, errno != ENXIO: we learned nothing + */ + offs = glfs_lseek(s->fd, start, SEEK_DATA); + if (offs < 0) { + return -errno; /* D3 or D4 */ + } + assert(offs >= start); + + if (offs > start) { + /* D2: in hole, next data at offs */ + *hole = start; + *data = offs; + return 0; + } + + /* D1: in data, end not yet known */ + + /* + * SEEK_HOLE cases: + * H1. offs == start: start is in a hole + * If this happens here, a hole has been dug behind our back + * since the previous lseek(). + * H2. offs > start: either start is in data, next hole at offs, + * or start is in trailing hole, EOF at offs + * Linux treats trailing holes like any other hole: offs == + * start. Solaris seeks to EOF instead: offs > start (blech). + * If that happens here, a hole has been dug behind our back + * since the previous lseek(). + * H3. offs < 0, errno = ENXIO: start is beyond EOF + * If this happens, the file has been truncated behind our + * back since we opened it. Treat it like a trailing hole. + * H4. offs < 0, errno != ENXIO: we learned nothing + * Pretend we know nothing at all, i.e. "forget" about D1. + */ + offs = glfs_lseek(s->fd, start, SEEK_HOLE); + if (offs < 0) { + return -errno; /* D1 and (H3 or H4) */ + } + assert(offs >= start); + + if (offs > start) { + /* + * D1 and H2: either in data, next hole at offs, or it was in + * data but is now in a trailing hole. In the latter case, + * all bets are off. Treating it as if it there was data all + * the way to EOF is safe, so simply do that. + */ + *data = start; + *hole = offs; + return 0; + } + + /* D1 and H1 */ + return -EBUSY; +} + +/* + * Returns the allocation status of the specified sectors. + * + * If 'sector_num' is beyond the end of the disk image the return value is 0 + * and 'pnum' is set to 0. + * + * 'pnum' is set to the number of sectors (including and immediately following + * the specified sector) that are known to be in the same + * allocated/unallocated state. + * + * 'nb_sectors' is the max value 'pnum' should be set to. If nb_sectors goes + * beyond the end of the disk image it will be clamped. + * + * (Based on raw_co_get_block_status() from raw-posix.c.) + */ +static int64_t coroutine_fn qemu_gluster_co_get_block_status( + BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum, + BlockDriverState **file) +{ + BDRVGlusterState *s = bs->opaque; + off_t start, data = 0, hole = 0; + int64_t total_size; + int ret = -EINVAL; + + if (!s->fd) { + return ret; + } + + start = sector_num * BDRV_SECTOR_SIZE; + total_size = bdrv_getlength(bs); + if (total_size < 0) { + return total_size; + } else if (start >= total_size) { + *pnum = 0; + return 0; + } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) { + nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE); + } + + ret = find_allocation(bs, start, &data, &hole); + if (ret == -ENXIO) { + /* Trailing hole */ + *pnum = nb_sectors; + ret = BDRV_BLOCK_ZERO; + } else if (ret < 0) { + /* No info available, so pretend there are no holes */ + *pnum = nb_sectors; + ret = BDRV_BLOCK_DATA; + } else if (data == start) { + /* On a data extent, compute sectors to the end of the extent, + * possibly including a partial sector at EOF. */ + *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE)); + ret = BDRV_BLOCK_DATA; + } else { + /* On a hole, compute sectors to the beginning of the next extent. */ + assert(hole == start); + *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE); + ret = BDRV_BLOCK_ZERO; + } + + *file = bs; + + return ret | BDRV_BLOCK_OFFSET_VALID | start; +} + + static QemuOptsList qemu_gluster_create_opts = { .name = "qemu-gluster-create-opts", .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head), @@ -719,6 +901,7 @@ static BlockDriver bdrv_gluster = { #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif + .bdrv_co_get_block_status = qemu_gluster_co_get_block_status, .create_opts = &qemu_gluster_create_opts, }; @@ -746,6 +929,7 @@ static BlockDriver bdrv_gluster_tcp = { #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif + .bdrv_co_get_block_status = qemu_gluster_co_get_block_status, .create_opts = &qemu_gluster_create_opts, }; @@ -773,6 +957,7 @@ static BlockDriver bdrv_gluster_unix = { #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif + .bdrv_co_get_block_status = qemu_gluster_co_get_block_status, .create_opts = &qemu_gluster_create_opts, }; @@ -800,6 +985,7 @@ static BlockDriver bdrv_gluster_rdma = { #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif + .bdrv_co_get_block_status = qemu_gluster_co_get_block_status, .create_opts = &qemu_gluster_create_opts, };