From patchwork Wed Apr 13 19:01:58 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Prasad Joshi X-Patchwork-Id: 705561 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p3DJ21WL026719 for ; Wed, 13 Apr 2011 19:02:01 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757646Ab1DMTB6 (ORCPT ); Wed, 13 Apr 2011 15:01:58 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:56983 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754362Ab1DMTB5 (ORCPT ); Wed, 13 Apr 2011 15:01:57 -0400 Received: by wwa36 with SMTP id 36so1097556wwa.1 for ; Wed, 13 Apr 2011 12:01:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:date:message-id:x-mailer; bh=8I1pLohq3MdvryAA+X5h7HCICSTDh+We2H0pJtkOsaI=; b=f/TxCxBDMESYLI5Wc5X0X3lanlyqZ4W0Au67CeYhzmCDQvLPPZKwvQKaG9xamVePnQ ZYgrCjFNIHp83Ic2FeLufy0/VmhGfBXn0rAg8ezWgt79U39LO5evNQPI2u4Om/Ab0qAn fZHUZBN1U46FwaKkx5OCx8d+09mLjKhvUyjmI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:date:message-id:x-mailer; b=QAEogY0DJB8rybErtl3pFGVq8qkFMZtoHgfdWUYSk6zqwTCEnFffulxWvK0L17/iLL Cf/OYzn7Ydd6+nOHQ0f4+xGSlpJv/pluA7GCNEOzW2/siWWauNUbqv+BshDXPWI6p299 JooS4w3dTZgW+UVh7uq3ydexLGXg+Mn2MCjII= Received: by 10.216.35.132 with SMTP id u4mr7481142wea.98.1302721315002; Wed, 13 Apr 2011 12:01:55 -0700 (PDT) Received: from prasad-kvm.localdomain (pineapple.rdg.ac.uk [134.225.206.123]) by mx.google.com with ESMTPS id d54sm426383wej.10.2011.04.13.12.01.53 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 13 Apr 2011 12:01:54 -0700 (PDT) Received: by prasad-kvm.localdomain (Postfix, from userid 1000) id C7DD926E006E; Wed, 13 Apr 2011 20:01:59 +0100 (BST) From: Prasad Joshi To: prasadjoshi124@gmail.com Cc: mingo@elte.hu, kvm@vger.kernel.org, penberg@kernel.org, asias.hejun@gmail.com, gorcunov@gmail.com, levinsasha928@gmail.com, kwolf@redhat.com, stefanha@linux.vnet.ibm.com Subject: [PATCH] kvm tool: add QCOW verions 1 read/write support Date: Wed, 13 Apr 2011 20:01:58 +0100 Message-Id: <1302721318-29904-1-git-send-email-prasadjoshi124@gmail.com> X-Mailer: git-send-email 1.7.1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Wed, 13 Apr 2011 19:02:01 +0000 (UTC) The patch only implements the basic read write support for QCOW version 1 images. Many of the QCOW features are not implmented, for example - image creation - snapshot - copy-on-write - encryption Renamed the file CREDITS-Git to CREDITS and added QEMU credits to CREDITS file. Signed-off-by: Prasad Joshi --- tools/kvm/CREDITS | 46 +++++ tools/kvm/Makefile | 2 + tools/kvm/disk-image.c | 7 + tools/kvm/include/kvm/qcow.h | 55 ++++++ tools/kvm/include/linux/byteorder.h | 7 + tools/kvm/include/linux/types.h | 19 ++ tools/kvm/qcow.c | 123 +++++++++++++ tools/kvm/qcow1.c | 325 +++++++++++++++++++++++++++++++++++ 8 files changed, 584 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/CREDITS create mode 100644 tools/kvm/include/kvm/qcow.h create mode 100644 tools/kvm/include/linux/byteorder.h create mode 100644 tools/kvm/qcow.c create mode 100644 tools/kvm/qcow1.c diff --git a/tools/kvm/CREDITS b/tools/kvm/CREDITS new file mode 100644 index 0000000..3e6cf55 --- /dev/null +++ b/tools/kvm/CREDITS @@ -0,0 +1,46 @@ +Perf/Git: +Most of the infrastructure that 'perf' uses here has been reused +from the Git project, as of version: + + 66996ec: Sync with 1.6.2.4 + +Here is an (incomplete!) list of main contributors to those files +in util/* and elsewhere: + + Alex Riesen + Christian Couder + Dmitry Potapov + Jeff King + Johannes Schindelin + Johannes Sixt + Junio C Hamano + Linus Torvalds + Matthias Kestenholz + Michal Ostrowski + Miklos Vajna + Petr Baudis + Pierre Habouzit + René Scharfe + Samuel Tardieu + Shawn O. Pearce + Steffen Prohaska + Steve Haslam + +Thanks guys! + +The full history of the files can be found in the upstream Git commits. + + +QEMU +The source code of the QEMU was referenced while developing the QCOW support +for the kvm tool. The relevant QEMU commits were + +66f82ce block: Open the underlying image file in generic code +ea2384d new disk image layer + +Here is a possibly incomplete list of main contributors + Kevin Wolf + Fabrice Bellard + Stefan Hajnoczi + +Thanks a lot all! diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 6895113..098b328 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -34,6 +34,8 @@ OBJS += util/strbuf.o OBJS += kvm-help.o OBJS += kvm-cmd.o OBJS += kvm-run.o +OBJS += qcow.o +OBJS += qcow1.o DEPS := $(patsubst %.o,%.d,$(OBJS)) diff --git a/tools/kvm/disk-image.c b/tools/kvm/disk-image.c index 908a744..ff3c076 100644 --- a/tools/kvm/disk-image.c +++ b/tools/kvm/disk-image.c @@ -13,6 +13,9 @@ #include #include +#include +#include + struct disk_image *disk_image__new(int fd, uint64_t size, struct disk_image_operations *ops) { struct disk_image *self; @@ -124,6 +127,10 @@ struct disk_image *disk_image__open(const char *filename, bool readonly) if (fd < 0) return NULL; + self = qcow_probe(fd); + if (self) + return self; + self = raw_image__probe(fd, readonly); if (self) return self; diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h new file mode 100644 index 0000000..96f7ad5 --- /dev/null +++ b/tools/kvm/include/kvm/qcow.h @@ -0,0 +1,55 @@ +#ifndef __QEMU_H__ + +#define __QEMU_H__ + +#define QCOW_MAGIC (('Q' << 24) | ('F' << 16) | ('I' << 8) | 0xfb) +#define QCOW1_VERSION 1 +#define QCOW2_VERSION 2 + +#define QCOW_OFLAG_COMPRESSED (1LL << 63) + +struct qcow_table { + uint32_t table_size; + u64 *l1_table; +}; + +struct qcow { + struct qcow_table *table; + void *header; + int fd; +}; + +/* common qcow header */ +struct qcow_common_header { + uint32_t magic; + uint32_t version; +}; + +/* qcow version 1 header format */ +struct qcow1_header { + uint32_t magic; + uint32_t version; + + u64 backing_file_offset; + uint32_t backing_file_size; + uint32_t mtime; + + u64 size; /* in bytes */ + + uint8_t cluster_bits; + uint8_t l2_bits; + uint32_t crypt_method; + + u64 l1_table_offset; +}; + +/* qcow common operations */ +struct disk_image *qcow_probe(int fd); +int qcow_read_l1_table(struct qcow *q); +int qcow_pwrite_with_sync(int fd, void *buf, size_t count, off_t offset); + +/* qcow1 global variables and operations */ +extern struct disk_image_operations qcow1_disk_ops; +uint32_t qcow1_get_table_size(struct qcow *q); +struct disk_image *qcow1_probe(int fd); +#endif diff --git a/tools/kvm/include/linux/byteorder.h b/tools/kvm/include/linux/byteorder.h new file mode 100644 index 0000000..c490de8 --- /dev/null +++ b/tools/kvm/include/linux/byteorder.h @@ -0,0 +1,7 @@ +#ifndef __BYTE_ORDER_H__ +#define __BYTE_ORDER_H__ + +#include +#include + +#endif diff --git a/tools/kvm/include/linux/types.h b/tools/kvm/include/linux/types.h index 8b608e7..efd8519 100644 --- a/tools/kvm/include/linux/types.h +++ b/tools/kvm/include/linux/types.h @@ -27,4 +27,23 @@ typedef __s16 s16; typedef __u8 u8; typedef __s8 s8; +#ifdef __CHECKER__ +#define __bitwise__ __attribute__((bitwise)) +#else +#define __bitwise__ +#endif +#ifdef __CHECK_ENDIAN__ +#define __bitwise __bitwise__ +#else +#define __bitwise +#endif + + +typedef __u16 __bitwise __le16; +typedef __u16 __bitwise __be16; +typedef __u32 __bitwise __le32; +typedef __u32 __bitwise __be32; +typedef __u64 __bitwise __le64; +typedef __u64 __bitwise __be64; + #endif /* LINUX_TYPES_H */ diff --git a/tools/kvm/qcow.c b/tools/kvm/qcow.c new file mode 100644 index 0000000..1bd8da6 --- /dev/null +++ b/tools/kvm/qcow.c @@ -0,0 +1,123 @@ +/* + * This file contains code copied from QEMU source code + * + * Copyright (c) 2004-2006 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include +#include +#include + +#include +#include +#include + +#include + +#include +#include +#include +#include +#include +#include + +static inline int qcow_check_image(int fd) +{ + struct qcow_common_header header; + + if (pread_in_full(fd, &header, sizeof(struct qcow_common_header), 0) < 0) + return -1; + + be32_to_cpus(&header.magic); + be32_to_cpus(&header.version); + + if (header.magic != QCOW_MAGIC) + return -1; + + if (header.version == QCOW1_VERSION || header.version == QCOW2_VERSION) + return header.version; + return -1; +} + +int qcow_pwrite_with_sync(int fd, void *buf, size_t count, off_t offset) +{ + size_t rc; + + rc = pwrite_in_full(fd, buf, count, offset); + if (rc != count) + return -1; + + if (fsync(fd) < 0) + return -1; + return 0; +} + +int qcow_read_l1_table(struct qcow *q) +{ + struct qcow1_header *h = q->header; + struct qcow_table *table; + u64 table_offset; + u64 map_offset; + const long page_size = sysconf(_SC_PAGESIZE); + long page_offset; + u32 l1_i; + + q->table = table = calloc(1, sizeof(struct qcow_table)); + if (!table) + return -1; + + table->table_size = qcow1_get_table_size(q); + table_offset = h->l1_table_offset; + + map_offset = table_offset & page_size; + page_offset = table_offset & (~page_size); + + table->l1_table = calloc(table->table_size, sizeof(u64)); + if (!table->l1_table) + goto error; + + if (pread_in_full(q->fd, table->l1_table, table->table_size * + sizeof(u64), table_offset) < 0) + goto error; + + /* change to cpu specific byte-order */ + for (l1_i = 0; l1_i < table->table_size; l1_i++) + be64_to_cpus(&table->l1_table[l1_i]); + return 0; +error: + free(table->l1_table); + free(table); + return -1; +} + +struct disk_image *qcow_probe(int fd) +{ + int version; + + version = qcow_check_image(fd); + if (version < 0) + return NULL; + + if (version != QCOW1_VERSION) + die("Format qcow%d is not supported.\n", version); + + return qcow1_probe(fd); +} diff --git a/tools/kvm/qcow1.c b/tools/kvm/qcow1.c new file mode 100644 index 0000000..1376582 --- /dev/null +++ b/tools/kvm/qcow1.c @@ -0,0 +1,325 @@ +/* + * This file contains code copied from QEMU source code + * + * Copyright (c) 2004-2006 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include +#include +#include +#include + +#include +#include +#include + +#include +#include + +#include + +#include +#include +#include +#include +#include +#include + +static void *qcow1_get_header(int fd) +{ + struct qcow1_header *header = malloc(sizeof(struct qcow1_header)); + + if (!header) + return NULL; + + if (pread_in_full(fd, header, sizeof(struct qcow1_header), 0) < 0) + return NULL; + + /* change to cpu byte-order */ + be32_to_cpus(&header->magic); + be32_to_cpus(&header->version); + be64_to_cpus(&header->backing_file_offset); + be32_to_cpus(&header->backing_file_size); + be32_to_cpus(&header->mtime); + be64_to_cpus(&header->size); + be32_to_cpus(&header->crypt_method); + be64_to_cpus(&header->l1_table_offset); + + return header; +} + +uint32_t qcow1_get_table_size(struct qcow *q) +{ + struct qcow1_header *h = q->header; + int l1_size; + int shift; + + shift = h->cluster_bits + h->l2_bits; + l1_size = (h->size + (1ULL << shift) - 1) >> shift; + + return l1_size; + +} + +struct disk_image *qcow1_probe(int fd) +{ + struct qcow *q; + struct qcow1_header *h; + struct disk_image *disk_image; + + q = calloc(1, sizeof(struct qcow)); + if (!q) + goto error; + + q->fd = fd; + /* allocates memory for header */ + h = q->header = qcow1_get_header(fd); + if (!h) + goto error; + + if (qcow_read_l1_table(q) < 0) + goto error; + + disk_image = disk_image__new(fd, h->size, &qcow1_disk_ops); + if (!disk_image) + goto error; + disk_image->priv = q; + + return disk_image; +error: + if (!q) + return NULL; + + free(q->table); + free(q->header); + free(q); + return NULL; +} + +static inline u64 get_file_length(int fd) +{ + struct stat buf; + if (fstat(fd, &buf) < 0) + die("Unable to get the disk image's file status."); + return buf.st_size; +} + +static u64 get_cluster_offset(struct qcow *q, uint64_t offset, int allocate) +{ + struct qcow1_header *h = q->header; + + struct qcow_table *l1_tab = q->table; + int l1_index; + + const int l2_size = 1 << h->l2_bits; + bool new_table = false; + int l2_index; + u64 l2_offset; + u64 *l2_table; + + int cluster_size = 1 << h->cluster_bits; + u64 cluster_offset; + + l1_index = offset >> (h->l2_bits + h->cluster_bits); + l2_offset = l1_tab->l1_table[l1_index]; + if (!l2_offset) { + u64 tmp; + if (!allocate) + return 0; + /* need to allocate a new l2 entry at the end of the file */ + + /* align to the l2_offset to next cluster */ + l2_offset = get_file_length(q->fd); + l2_offset = (l2_offset + cluster_size - 1) & ~(cluster_size - 1); + + /* update the entry in the in-core table */ + l1_tab->l1_table[l1_index] = l2_offset; + + /* update the file entry in big-endian byte order*/ + tmp = cpu_to_be64(l2_offset); + if (qcow_pwrite_with_sync(q->fd, &tmp, sizeof(tmp), + h->l1_table_offset + l1_index * sizeof(tmp)) < 0) + return 0; + new_table = true; + } + + /* TODO + * add caching to avoid read l2 every time the function is invoked. + */ + l2_table = malloc(l2_size * sizeof(u64)); + if (new_table == false) { + int l2_i; + /* read the table from the file */ + if (pread_in_full(q->fd, l2_table, l2_size * sizeof(u64), + l2_offset) < 0) + goto error; + /* change to cpu specific byte-order */ + for (l2_i = 0; l2_i < l2_size; l2_i++) + be64_to_cpus(&l2_table[l2_i]); + } else { + /* new l2 entry allocated, write 0's in l2 table */ + memset(l2_table, 0, l2_size * sizeof(u64)); + if (qcow_pwrite_with_sync(q->fd, l2_table, l2_size * + sizeof(u64), l2_offset) < 0) + goto error; + } + + l2_index = (offset >> h->cluster_bits) & (l2_size - 1); + cluster_offset = l2_table[l2_index]; + if (!cluster_offset && allocate) { + u64 tmp; + /* need to allocate a new cluster */ + /* align to the cluster_offset to start of next cluster */ + cluster_offset = get_file_length(q->fd); + cluster_offset = (cluster_offset + cluster_size - 1) & + ~(cluster_size - 1); + + /* update the in-cache table */ + l2_table[l2_index] = cluster_offset; + + /* update the file entry in big-endian byte order*/ + tmp = cpu_to_be64(cluster_offset); + if (qcow_pwrite_with_sync(q->fd, &tmp, sizeof(tmp), l2_offset + + l2_index * sizeof(tmp)) < 0) + goto error; + } + free(l2_table); + /* returning cluster_offset in the cpu byte-order */ + return cluster_offset; +error: + free(l2_table); + return 0; +} + +static int qcow1_read_sector(struct disk_image *self, uint64_t sector, + void *dst, uint32_t dst_len) +{ + struct qcow *q = self->priv; + char *buf = dst; + uint64_t cluster_offset; + + struct qcow1_header *h = q->header; + int cluster_sectors = 1 << (h->cluster_bits - SECTOR_SHIFT); + + int nb_sectors = dst_len / SECTOR_SIZE; + int index_in_cluster; + + int n; + + while (nb_sectors) { + cluster_offset = get_cluster_offset(q, sector << SECTOR_SHIFT, 0); + + /* which sector to read from the cluster */ + index_in_cluster = sector & (cluster_sectors - 1); + + /* find the number of sectors to read */ + n = cluster_sectors - index_in_cluster; + if (n > nb_sectors) + n = nb_sectors; + + if (!cluster_offset) { + memset(buf, 0, SECTOR_SIZE * n); + } else { + /* + * read data beginning at + * cluster_offset + (index_in_cluster * 512) + * size of data = n * 512 + */ + if (pread_in_full(q->fd, buf, n * SECTOR_SIZE, + cluster_offset + index_in_cluster * + SECTOR_SIZE) < 0) + return -1; + } + + nb_sectors -= n; + sector += n; + buf += (n * SECTOR_SIZE); + } + + return 0; +} + +static int qcow1_write_sector(struct disk_image *self, uint64_t sector, + void *src, uint32_t src_len) +{ + struct qcow *q = self->priv; + char *buf = src; + uint64_t cluster_offset; + + struct qcow1_header *h = q->header; + int cluster_sectors = 1 << (h->cluster_bits - SECTOR_SHIFT); + + int nb_sectors = src_len / SECTOR_SIZE; + int index_in_cluster; + + int rc; + int n; + + while (nb_sectors) { + cluster_offset = get_cluster_offset(q, sector << SECTOR_SHIFT, 1); + if (!cluster_offset) + return -1; + + /* which sector to read from the cluster */ + index_in_cluster = sector & (cluster_sectors - 1); + + /* find the number of sectors to read */ + n = cluster_sectors - index_in_cluster; + if (n > nb_sectors) + n = nb_sectors; + + /* + * write data at + * cluster_offset + (index_in_cluster * 512) + * size of data = n * 512 + */ + rc = qcow_pwrite_with_sync(q->fd, buf, n * SECTOR_SIZE, + cluster_offset + index_in_cluster * + SECTOR_SIZE); + if (rc < 0) + return -1; + + nb_sectors -= n; + sector += n; + buf += (n * SECTOR_SIZE); + } + + return 0; +} + +static void qcow1_disk_close(struct disk_image *self) +{ + struct qcow *q; + + if (!self) + return; + + q = self->priv; + free(q->table); + free(q->header); + free(q); +} + +struct disk_image_operations qcow1_disk_ops = { + .read_sector = qcow1_read_sector, + .write_sector = qcow1_write_sector, + .close = qcow1_disk_close +};