From patchwork Wed Apr 13 19:26:02 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Prasad Joshi X-Patchwork-Id: 705591 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p3DJQ5lU007545 for ; Wed, 13 Apr 2011 19:26:06 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758095Ab1DMT0A (ORCPT ); Wed, 13 Apr 2011 15:26:00 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:44615 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757978Ab1DMTZ7 (ORCPT ); Wed, 13 Apr 2011 15:25:59 -0400 Received: by wwa36 with SMTP id 36so1119777wwa.1 for ; Wed, 13 Apr 2011 12:25:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:date:message-id:x-mailer; bh=ZAPl4iwmc2h1KAb2nBUpJc2vpzbIEfvZ+L8ni3xO9+U=; b=wn2fYJ423ahSW75PRnl6GcNh1Ycm/Rr++dmn13r3ORWwaPNozD4UIUbFzBwhIMleVu Shsop79iMyFZnbISRWCMbMWi61nDNbJ8RlbAEWww2NfgIoTrQNzWIExuH9/bx2Dd13+i sl1QxYH0MS9+MnXIVONUoxLA+tSLEFY7PKt20= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:date:message-id:x-mailer; b=HRzECMD//dD2vnx0e5PvP2cZioFzbWwZaKlc7L4yT0/dMXMY2gYBC59nCqI6wOfgo/ uaZG7v5J9xHSMH/r4VSlqRQiJeAZuHLN3fXDNkzepl22UQZssdH/6+HUzTaE1QO4NPnS qD8NiiI1H8CChkbRuRPpSMGJ50/g0EA+JByME= Received: by 10.227.150.17 with SMTP id w17mr4386322wbv.95.1302722758194; Wed, 13 Apr 2011 12:25:58 -0700 (PDT) Received: from prasad-kvm.localdomain (pineapple.rdg.ac.uk [134.225.206.123]) by mx.google.com with ESMTPS id l24sm530796wbc.47.2011.04.13.12.25.57 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 13 Apr 2011 12:25:57 -0700 (PDT) Received: by prasad-kvm.localdomain (Postfix, from userid 1000) id 5762626E006E; Wed, 13 Apr 2011 20:26:03 +0100 (BST) From: Prasad Joshi To: prasadjoshi124@gmail.com Cc: mingo@elte.hu, kvm@vger.kernel.org, penberg@kernel.org, asias.hejun@gmail.com, gorcunov@gmail.com, levinsasha928@gmail.com, kwolf@redhat.com, stefanha@linux.vnet.ibm.com Subject: [PATCH v2] kvm tool: add QCOW verions 1 read/write support Date: Wed, 13 Apr 2011 20:26:02 +0100 Message-Id: <1302722762-30517-1-git-send-email-prasadjoshi124@gmail.com> X-Mailer: git-send-email 1.7.1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Wed, 13 Apr 2011 19:26:06 +0000 (UTC) The patch only implements the basic read write support for QCOW version 1 images. Many of the QCOW features are not implmented, for example - image creation - snapshot - copy-on-write - encryption Renamed the file CREDITS-Git to CREDITS and added QEMU credits to CREDITS file. Signed-off-by: Prasad Joshi --- tools/kvm/CREDITS | 46 ++++++ tools/kvm/Makefile | 2 + tools/kvm/disk-image.c | 7 + tools/kvm/include/kvm/qcow.h | 55 +++++++ tools/kvm/include/linux/byteorder.h | 7 + tools/kvm/include/linux/types.h | 19 +++ tools/kvm/qcow.c | 99 ++++++++++++ tools/kvm/qcow1.c | 301 +++++++++++++++++++++++++++++++++++ 8 files changed, 536 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/CREDITS create mode 100644 tools/kvm/include/kvm/qcow.h create mode 100644 tools/kvm/include/linux/byteorder.h create mode 100644 tools/kvm/qcow.c create mode 100644 tools/kvm/qcow1.c diff --git a/tools/kvm/CREDITS b/tools/kvm/CREDITS new file mode 100644 index 0000000..96cc8d5 --- /dev/null +++ b/tools/kvm/CREDITS @@ -0,0 +1,46 @@ +Perf/Git: +Most of the infrastructure that 'perf' uses here has been reused +from the Git project, as of version: + + 66996ec: Sync with 1.6.2.4 + +Here is an (incomplete!) list of main contributors to those files +in util/* and elsewhere: + + Alex Riesen + Christian Couder + Dmitry Potapov + Jeff King + Johannes Schindelin + Johannes Sixt + Junio C Hamano + Linus Torvalds + Matthias Kestenholz + Michal Ostrowski + Miklos Vajna + Petr Baudis + Pierre Habouzit + René Scharfe + Samuel Tardieu + Shawn O. Pearce + Steffen Prohaska + Steve Haslam + +Thanks guys! + +The full history of the files can be found in the upstream Git commits. + + +QEMU +The source code of the QEMU was referenced while developing the QCOW support +for the kvm tool. The relevant QEMU commits were + +66f82ce block: Open the underlying image file in generic code +ea2384d new disk image layer + +Here is a possibly incomplete list of main contributors + Kevin Wolf + Fabrice Bellard + Stefan Hajnoczi + +Thanks a lot all! diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 6895113..098b328 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -34,6 +34,8 @@ OBJS += util/strbuf.o OBJS += kvm-help.o OBJS += kvm-cmd.o OBJS += kvm-run.o +OBJS += qcow.o +OBJS += qcow1.o DEPS := $(patsubst %.o,%.d,$(OBJS)) diff --git a/tools/kvm/disk-image.c b/tools/kvm/disk-image.c index 5d0f342..05b58b3 100644 --- a/tools/kvm/disk-image.c +++ b/tools/kvm/disk-image.c @@ -13,6 +13,9 @@ #include #include +#include +#include + struct disk_image *disk_image__new(int fd, uint64_t size, struct disk_image_operations *ops) { struct disk_image *self; @@ -131,6 +134,10 @@ struct disk_image *disk_image__open(const char *filename, bool readonly) if (fd < 0) return NULL; + self = qcow_probe(fd); + if (self) + return self; + self = raw_image__probe(fd, readonly); if (self) return self; diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h new file mode 100644 index 0000000..96f7ad5 --- /dev/null +++ b/tools/kvm/include/kvm/qcow.h @@ -0,0 +1,55 @@ +#ifndef __QEMU_H__ + +#define __QEMU_H__ + +#define QCOW_MAGIC (('Q' << 24) | ('F' << 16) | ('I' << 8) | 0xfb) +#define QCOW1_VERSION 1 +#define QCOW2_VERSION 2 + +#define QCOW_OFLAG_COMPRESSED (1LL << 63) + +struct qcow_table { + uint32_t table_size; + u64 *l1_table; +}; + +struct qcow { + struct qcow_table *table; + void *header; + int fd; +}; + +/* common qcow header */ +struct qcow_common_header { + uint32_t magic; + uint32_t version; +}; + +/* qcow version 1 header format */ +struct qcow1_header { + uint32_t magic; + uint32_t version; + + u64 backing_file_offset; + uint32_t backing_file_size; + uint32_t mtime; + + u64 size; /* in bytes */ + + uint8_t cluster_bits; + uint8_t l2_bits; + uint32_t crypt_method; + + u64 l1_table_offset; +}; + +/* qcow common operations */ +struct disk_image *qcow_probe(int fd); +int qcow_read_l1_table(struct qcow *q); +int qcow_pwrite_with_sync(int fd, void *buf, size_t count, off_t offset); + +/* qcow1 global variables and operations */ +extern struct disk_image_operations qcow1_disk_ops; +uint32_t qcow1_get_table_size(struct qcow *q); +struct disk_image *qcow1_probe(int fd); +#endif diff --git a/tools/kvm/include/linux/byteorder.h b/tools/kvm/include/linux/byteorder.h new file mode 100644 index 0000000..c490de8 --- /dev/null +++ b/tools/kvm/include/linux/byteorder.h @@ -0,0 +1,7 @@ +#ifndef __BYTE_ORDER_H__ +#define __BYTE_ORDER_H__ + +#include +#include + +#endif diff --git a/tools/kvm/include/linux/types.h b/tools/kvm/include/linux/types.h index 8b608e7..efd8519 100644 --- a/tools/kvm/include/linux/types.h +++ b/tools/kvm/include/linux/types.h @@ -27,4 +27,23 @@ typedef __s16 s16; typedef __u8 u8; typedef __s8 s8; +#ifdef __CHECKER__ +#define __bitwise__ __attribute__((bitwise)) +#else +#define __bitwise__ +#endif +#ifdef __CHECK_ENDIAN__ +#define __bitwise __bitwise__ +#else +#define __bitwise +#endif + + +typedef __u16 __bitwise __le16; +typedef __u16 __bitwise __be16; +typedef __u32 __bitwise __le32; +typedef __u32 __bitwise __be32; +typedef __u64 __bitwise __le64; +typedef __u64 __bitwise __be64; + #endif /* LINUX_TYPES_H */ diff --git a/tools/kvm/qcow.c b/tools/kvm/qcow.c new file mode 100644 index 0000000..1764975 --- /dev/null +++ b/tools/kvm/qcow.c @@ -0,0 +1,99 @@ +#include +#include +#include + +#include +#include +#include + +#include + +#include +#include +#include +#include +#include +#include + +static inline int qcow_check_image(int fd) +{ + struct qcow_common_header header; + + if (pread_in_full(fd, &header, sizeof(struct qcow_common_header), 0) < 0) + return -1; + + be32_to_cpus(&header.magic); + be32_to_cpus(&header.version); + + if (header.magic != QCOW_MAGIC) + return -1; + + if (header.version == QCOW1_VERSION || header.version == QCOW2_VERSION) + return header.version; + return -1; +} + +int qcow_pwrite_with_sync(int fd, void *buf, size_t count, off_t offset) +{ + size_t rc; + + rc = pwrite_in_full(fd, buf, count, offset); + if (rc != count) + return -1; + + if (fsync(fd) < 0) + return -1; + return 0; +} + +int qcow_read_l1_table(struct qcow *q) +{ + struct qcow1_header *h = q->header; + struct qcow_table *table; + u64 table_offset; + u64 map_offset; + const long page_size = sysconf(_SC_PAGESIZE); + long page_offset; + u32 l1_i; + + q->table = table = calloc(1, sizeof(struct qcow_table)); + if (!table) + return -1; + + table->table_size = qcow1_get_table_size(q); + table_offset = h->l1_table_offset; + + map_offset = table_offset & page_size; + page_offset = table_offset & (~page_size); + + table->l1_table = calloc(table->table_size, sizeof(u64)); + if (!table->l1_table) + goto error; + + if (pread_in_full(q->fd, table->l1_table, table->table_size * + sizeof(u64), table_offset) < 0) + goto error; + + /* change to cpu specific byte-order */ + for (l1_i = 0; l1_i < table->table_size; l1_i++) + be64_to_cpus(&table->l1_table[l1_i]); + return 0; +error: + free(table->l1_table); + free(table); + return -1; +} + +struct disk_image *qcow_probe(int fd) +{ + int version; + + version = qcow_check_image(fd); + if (version < 0) + return NULL; + + if (version != QCOW1_VERSION) + die("Format qcow%d is not supported.\n", version); + + return qcow1_probe(fd); +} diff --git a/tools/kvm/qcow1.c b/tools/kvm/qcow1.c new file mode 100644 index 0000000..7947543 --- /dev/null +++ b/tools/kvm/qcow1.c @@ -0,0 +1,301 @@ +#include +#include +#include +#include + +#include +#include +#include + +#include +#include + +#include + +#include +#include +#include +#include +#include +#include + +static void *qcow1_get_header(int fd) +{ + struct qcow1_header *header = malloc(sizeof(struct qcow1_header)); + + if (!header) + return NULL; + + if (pread_in_full(fd, header, sizeof(struct qcow1_header), 0) < 0) + return NULL; + + /* change to cpu byte-order */ + be32_to_cpus(&header->magic); + be32_to_cpus(&header->version); + be64_to_cpus(&header->backing_file_offset); + be32_to_cpus(&header->backing_file_size); + be32_to_cpus(&header->mtime); + be64_to_cpus(&header->size); + be32_to_cpus(&header->crypt_method); + be64_to_cpus(&header->l1_table_offset); + + return header; +} + +uint32_t qcow1_get_table_size(struct qcow *q) +{ + struct qcow1_header *h = q->header; + int l1_size; + int shift; + + shift = h->cluster_bits + h->l2_bits; + l1_size = (h->size + (1ULL << shift) - 1) >> shift; + + return l1_size; + +} + +struct disk_image *qcow1_probe(int fd) +{ + struct qcow *q; + struct qcow1_header *h; + struct disk_image *disk_image; + + q = calloc(1, sizeof(struct qcow)); + if (!q) + goto error; + + q->fd = fd; + /* allocates memory for header */ + h = q->header = qcow1_get_header(fd); + if (!h) + goto error; + + if (qcow_read_l1_table(q) < 0) + goto error; + + disk_image = disk_image__new(fd, h->size, &qcow1_disk_ops); + if (!disk_image) + goto error; + disk_image->priv = q; + + return disk_image; +error: + if (!q) + return NULL; + + free(q->table); + free(q->header); + free(q); + return NULL; +} + +static inline u64 get_file_length(int fd) +{ + struct stat buf; + if (fstat(fd, &buf) < 0) + die("Unable to get the disk image's file status."); + return buf.st_size; +} + +static u64 get_cluster_offset(struct qcow *q, uint64_t offset, int allocate) +{ + struct qcow1_header *h = q->header; + + struct qcow_table *l1_tab = q->table; + int l1_index; + + const int l2_size = 1 << h->l2_bits; + bool new_table = false; + int l2_index; + u64 l2_offset; + u64 *l2_table; + + int cluster_size = 1 << h->cluster_bits; + u64 cluster_offset; + + l1_index = offset >> (h->l2_bits + h->cluster_bits); + l2_offset = l1_tab->l1_table[l1_index]; + if (!l2_offset) { + u64 tmp; + if (!allocate) + return 0; + /* need to allocate a new l2 entry at the end of the file */ + + /* align to the l2_offset to next cluster */ + l2_offset = get_file_length(q->fd); + l2_offset = (l2_offset + cluster_size - 1) & ~(cluster_size - 1); + + /* update the entry in the in-core table */ + l1_tab->l1_table[l1_index] = l2_offset; + + /* update the file entry in big-endian byte order*/ + tmp = cpu_to_be64(l2_offset); + if (qcow_pwrite_with_sync(q->fd, &tmp, sizeof(tmp), + h->l1_table_offset + l1_index * sizeof(tmp)) < 0) + return 0; + new_table = true; + } + + /* TODO + * add caching to avoid read l2 every time the function is invoked. + */ + l2_table = malloc(l2_size * sizeof(u64)); + if (new_table == false) { + int l2_i; + /* read the table from the file */ + if (pread_in_full(q->fd, l2_table, l2_size * sizeof(u64), + l2_offset) < 0) + goto error; + /* change to cpu specific byte-order */ + for (l2_i = 0; l2_i < l2_size; l2_i++) + be64_to_cpus(&l2_table[l2_i]); + } else { + /* new l2 entry allocated, write 0's in l2 table */ + memset(l2_table, 0, l2_size * sizeof(u64)); + if (qcow_pwrite_with_sync(q->fd, l2_table, l2_size * + sizeof(u64), l2_offset) < 0) + goto error; + } + + l2_index = (offset >> h->cluster_bits) & (l2_size - 1); + cluster_offset = l2_table[l2_index]; + if (!cluster_offset && allocate) { + u64 tmp; + /* need to allocate a new cluster */ + /* align to the cluster_offset to start of next cluster */ + cluster_offset = get_file_length(q->fd); + cluster_offset = (cluster_offset + cluster_size - 1) & + ~(cluster_size - 1); + + /* update the in-cache table */ + l2_table[l2_index] = cluster_offset; + + /* update the file entry in big-endian byte order*/ + tmp = cpu_to_be64(cluster_offset); + if (qcow_pwrite_with_sync(q->fd, &tmp, sizeof(tmp), l2_offset + + l2_index * sizeof(tmp)) < 0) + goto error; + } + free(l2_table); + /* returning cluster_offset in the cpu byte-order */ + return cluster_offset; +error: + free(l2_table); + return 0; +} + +static int qcow1_read_sector(struct disk_image *self, uint64_t sector, + void *dst, uint32_t dst_len) +{ + struct qcow *q = self->priv; + char *buf = dst; + uint64_t cluster_offset; + + struct qcow1_header *h = q->header; + int cluster_sectors = 1 << (h->cluster_bits - SECTOR_SHIFT); + + int nb_sectors = dst_len / SECTOR_SIZE; + int index_in_cluster; + + int n; + + while (nb_sectors) { + cluster_offset = get_cluster_offset(q, sector << SECTOR_SHIFT, 0); + + /* which sector to read from the cluster */ + index_in_cluster = sector & (cluster_sectors - 1); + + /* find the number of sectors to read */ + n = cluster_sectors - index_in_cluster; + if (n > nb_sectors) + n = nb_sectors; + + if (!cluster_offset) { + memset(buf, 0, SECTOR_SIZE * n); + } else { + /* + * read data beginning at + * cluster_offset + (index_in_cluster * 512) + * size of data = n * 512 + */ + if (pread_in_full(q->fd, buf, n * SECTOR_SIZE, + cluster_offset + index_in_cluster * + SECTOR_SIZE) < 0) + return -1; + } + + nb_sectors -= n; + sector += n; + buf += (n * SECTOR_SIZE); + } + + return 0; +} + +static int qcow1_write_sector(struct disk_image *self, uint64_t sector, + void *src, uint32_t src_len) +{ + struct qcow *q = self->priv; + char *buf = src; + uint64_t cluster_offset; + + struct qcow1_header *h = q->header; + int cluster_sectors = 1 << (h->cluster_bits - SECTOR_SHIFT); + + int nb_sectors = src_len / SECTOR_SIZE; + int index_in_cluster; + + int rc; + int n; + + while (nb_sectors) { + cluster_offset = get_cluster_offset(q, sector << SECTOR_SHIFT, 1); + if (!cluster_offset) + return -1; + + /* which sector to read from the cluster */ + index_in_cluster = sector & (cluster_sectors - 1); + + /* find the number of sectors to read */ + n = cluster_sectors - index_in_cluster; + if (n > nb_sectors) + n = nb_sectors; + + /* + * write data at + * cluster_offset + (index_in_cluster * 512) + * size of data = n * 512 + */ + rc = qcow_pwrite_with_sync(q->fd, buf, n * SECTOR_SIZE, + cluster_offset + index_in_cluster * + SECTOR_SIZE); + if (rc < 0) + return -1; + + nb_sectors -= n; + sector += n; + buf += (n * SECTOR_SIZE); + } + + return 0; +} + +static void qcow1_disk_close(struct disk_image *self) +{ + struct qcow *q; + + if (!self) + return; + + q = self->priv; + free(q->table); + free(q->header); + free(q); +} + +struct disk_image_operations qcow1_disk_ops = { + .read_sector = qcow1_read_sector, + .write_sector = qcow1_write_sector, + .close = qcow1_disk_close +};