From patchwork Thu Aug 28 22:16:50 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vasily Tarasov X-Patchwork-Id: 4808231 Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 9C6999F375 for ; Thu, 28 Aug 2014 23:12:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 71D2820122 for ; Thu, 28 Aug 2014 23:12:39 +0000 (UTC) Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by mail.kernel.org (Postfix) with ESMTP id 0A3372011E for ; Thu, 28 Aug 2014 23:12:37 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id s7SN97HU011819; Thu, 28 Aug 2014 19:09:07 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id s7SN8Goa004519 for ; Thu, 28 Aug 2014 19:08:16 -0400 Received: from mx1.redhat.com (ext-mx15.extmail.prod.ext.phx2.redhat.com [10.5.110.20]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s7SN8G19020754; Thu, 28 Aug 2014 19:08:16 -0400 Received: from mail-ie0-f172.google.com (mail-ie0-f172.google.com [209.85.223.172]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s7SN8E3j021558 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL); Thu, 28 Aug 2014 19:08:14 -0400 Received: by mail-ie0-f172.google.com with SMTP id rd18so1858080iec.3 for ; Thu, 28 Aug 2014 16:08:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:sender:from:date:subject:to:cc; bh=dvn8kCY+2n8O56rfaVzmtcsHdHV1eQ8x2hCz1gByXxQ=; b=0JlKfXZdU7C0AkPWawyH4btnFjymVfIaCS4XJCTp3Cxiro0Qvu9p2DN732H3V7+Nc1 B2KrgVFm39haR4Cx95UuxRX2v/NxkZ/4TM/gDHZHHzmRkaHSX7ML+QHiRtoeO9KKSJX9 2bLVdGqDK3+zRsbgyqSjOxQp6QGnzUAgPeOhgcJhEJMcnb+CRidgaP8MgrwuowtmwyqE xXu5WzKBBZDxHZhhrzo/PafI33L9saNL/Y4BriJzqSTNKhK9OexlQNH7F68Y23GSAjLv Lazc0WkJJRBFwtlPou8LBwtjgGIBVbnf+KWFTETybDEjnLIUuM8vePuCBP9bRdSBGNlE pmUQ== X-Received: by 10.43.155.13 with SMTP id lg13mr5712684icc.15.1409267293925; Thu, 28 Aug 2014 16:08:13 -0700 (PDT) Received: from localhost (p1.almaden.ibm.com. [198.4.83.52]) by mx.google.com with ESMTPSA id o20sm22012851igw.11.2014.08.28.16.08.13 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 28 Aug 2014 16:08:13 -0700 (PDT) Message-ID: <53ffb65d.144d320a.0fce.12eb@mx.google.com> From: Vasily Tarasov Date: Thu, 28 Aug 2014 18:16:50 -0400 To: dm-devel@redhat.com X-RedHat-Spam-Score: -2.45 (BAYES_00, DCC_REPUT_13_19, DKIM_SIGNED, DKIM_VALID, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Scanned-By: MIMEDefang 2.68 on 10.5.110.20 X-loop: dm-devel@redhat.com Cc: Joe Thornber , Mike Snitzer , Christoph Hellwig , Philip Shilane , Sonam Mandal , Erez Zadok Subject: [dm-devel] [PATCH RFCv2 10/10] dm-dedup: documentation X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Description of the dm-dedup target in Documentation. Signed-off-by: Vasily Tarasov --- Documentation/device-mapper/dedup.txt | 205 +++++++++++++++++++++++++++++++++ 1 files changed, 205 insertions(+), 0 deletions(-) create mode 100644 Documentation/device-mapper/dedup.txt diff --git a/Documentation/device-mapper/dedup.txt b/Documentation/device-mapper/dedup.txt new file mode 100644 index 0000000..b55dfc5 --- /dev/null +++ b/Documentation/device-mapper/dedup.txt @@ -0,0 +1,205 @@ +dm-dedup +======== + +Device-mapper's dedup target provides transparent data deduplication of block +devices. Every write coming to a dm-dedup instance is deduplicated against +previously written data. For datasets that contain many duplicates scattered +across the disk (e.g., virtual machine disk image collections, backups, home +directory servers) deduplication provides a significant amount of space +savings. + +Construction Parameters +======================= + + + + + This is the device where dm-dedup's metadata resides. Metadata + typically includes hash index, block mapping, and reference counters. + It should be specified as a path, like "/dev/sdaX". + + + This is the device where the actual data blocks are stored. + It should be specified as a path, like "/dev/sdaX". + + + This is the size of a single block on the data device in bytes. + Block is both a unit of deduplication and a unit of storage. + Supported values are between 4096 to 1048576 (1MB) and should be + a power of two. + + + This specifies which hashing algorithm dm-dedup will use for detecting + identical blocks, e.g., "md5" or "sha256". Any hash algorithm + supported by the running kernel can be used (see "/proc/crypto" file). + + + This is the backend that dm-dedup will use to store metadata. + Currently supported values are "cowbtree" and "inram". + Cowbtree backend uses persistent Copy-on-Write (COW) B-Trees to store + metadata. Inram backend stores all metadata in RAM which is + lost after a system reboot. Consequently, inram backend should + typically be used only for experiments. Notice, that though inram + backend does not use metadata device, parameter + should still be specified in the command line. + + + This parameter specifies how many writes to the target should occur + before dm-dedup flushes its buffered metadata to the metadata device. + In other words, in an event of power failure, one can loose up to this + number of most recent writes. Notice, that dm-dedup also flushes its + metadata when it sees REQ_FLUSH or REQ_FUA flags in the I/O requests. + In particular, these flags are set by file systems in the + appropriate points of time to ensure file system consistency. + +During construction, dm-dedup checks if the first 4096 bytes of the metadata +device are equal to zero. If they are, then a completely new dm-dedup instance +is initialized with the metadata and data devices considered "empty". If, +however, 4096 starting bytes are not zero, dm-dedup will try to reconstruct +the target based on the current information on the metadata and data devices. + +Theory of Operation +=================== + +We provide an overview of dm-dedup design in this section. Detailed design and +performance evaluation can be found in the following paper: + +V. Tarasov and D. Jain and G. Kuenning and S. Mandal and K. Palanisami and P. +Shilane and S. Trehan. Dmdedup: Device Mapper Target for Data Deduplication. +Ottawa Linux Symposium, 2014. +http://www.fsl.cs.stonybrook.edu/docs/ols-dmdedup/dmdedup-ols14.pdf + +To quickly identify duplicates, dm-dedup maintains an index of hashes for all +written blocks. Block is a user-configurable unit of deduplication and +storage. Dm-dedup index, along with other deduplication metadata, resides on +a separate block device, which we refer to as metadata device. Blocks +themselves are stored on the data device. Although the metadata device can be +any block device, e.g., an HDD or its partition, for higher performance we +recommend to use SSD devices to store metadata. + +For every block that is written to a target, dm-dedup computes its hash using +the . It then looks for the resulting hash in the hash index. If a +match is found then the write is considered to be a duplicate. + +Dm-dedup's hash index is essentially a mapping between the hash and the +physical address of a block on the data device. In addition, dm-dedup +maintains a mapping between logical block addresses on the target and physical +block address on the data device (LBN-PBN mapping). When a duplicate is +detected, there is no need to write actual data to the disk and only LBN-PBN +mapping is updated. + +When a non-duplicate data is written, new physical block on the data device is +allocated, written, and a corresponding hash is added to the index. + +On read, LBN-PBN mapping allows to quickly locate a required block on the data +device. If there were no writes to an LBN before, a zero block is returned. + +Target Size +----------- + +When using device-mapper one needs to specify target size in advance. To get +deduplication benefits, target size should be larger than the data device size +(or otherwise one could just use the data device directly). Because dataset +deduplication ratio is not known in advance one has to use an estimation. + +Usually, up to 1.5 deduplication ratio for a primary dataset is a safe +assumption. For backup datasets, however, deduplication ratio can be as high +as 100. + +Estimating deduplication ratio of an existing dataset using fs-hasher package +from http://tracer.filesystems.org/ can give a good starting point for a +specific dataset. + +If one over-estimates deduplication ratio, data device can run out of free +space. This situation can be monitored using dmsetup status command (described +below). After data device is full, dm-dedup will stop accepting writes until +free space becomes available on the data device again. + +Backends +-------- + +Dm-dedup's core logic considers index and LBN-PBN mappings as plain key-value +stores with an extended API described in + +drivers/md/dm-dedup-backend.h + +Different backends can provided key-value store API. We implemented a cowbtree +backend that uses device-mapper's persistent metadata framework to +consistently store metadata. Details on this framework and its on-disk layout +can be found here: + +Documentation/device-mapper/persistent-data.txt + +By using persistent COW B-trees, cowbtree backend guarantees consistency in +the event of power failure. + +In addition, we also provide inram backend that stores all metadata in RAM. +Hash tables with linear probing are used for storing the index and LBN-PBN +mapping. Inram backend does not store metadata persistently and should usually +by used only for experiments. + +Dmsetup Status +============== + +Dm-dedup exports various statistics via dmsetup status command. The line +returned by dmsetup status will contain the following values in the order: + + \ + \ + + +, , , and are generic fields printed by dmsetup tool +for any target. + + - total number of blocks on the data device + - number of free (unallocated) blocks on the data device + - number of used (allocated) blocks on the data device + - number of allocated logical blocks (were written at least once) + - block size in bytes + - data disk's major:minor + - metadata disk's major:minor + - total number of writes to the target + - the number of writes that weren't duplicates (were unique) + - the number of writes that were duplicates + - the number of times dm-dedup had to read data from the data + device because a write was misaligned (read-on-write effect) + - the number of writes to a logical block that was + written before at least once + - the number of writes to a logical address that was not written + before even once + +To compute deduplication ratio one needs to device dactual by dused. + +Example +======= + +Decide on metadata and data devices: + # META_DEV=/dev/sdX + # DATA_DEV=/dev/sdY + +Compute target size assuming 1.5 dedup ratio: + # DATA_DEV_SIZE=`blockdev --getsz $DATA_DEV` + # TARGET_SIZE=`expr $DATA_DEV_SIZE \* 10 / 15` + +Reset metadata device: + # dd if=/dev/zero of=$META_DEV bs=4096 count=1 + +Setup a target: + echo "0 $TARGET_SIZE dedup $META_DEV $DATA_DEV 4096 md5 cowbtree 100" |\ + dmsetup create mydedup + +Authors +======= + +dm-dedup was developed in the File system and Storage Lab (FSL) at Stony +Brook University Computer Science Department, in collaboration with Harvey +Mudd College and EMC. + +Key people involved in the project were Vasily Tarasov, Geoff Kuenning, +Sonam Mandal, Karthikeyani Palanisami, Philip Shilane, Sagar Trehan, and +Erez Zadok. + +We also acknowledge the help of several students involved in the +deduplication project: Teo Asinari, Deepak Jain, Mandar Joshi, Atul +Karmarkar, Meg O'Keefe, Gary Lent, Amar Mudrankit, Ujwala Tulshigiri, and +Nabil Zaman.