diff mbox

[2/2] rbd: add an option for md5 checksumming (v2)

Message ID 20110825110316.GB6517@sir.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Christian Brunner Aug. 25, 2011, 11:03 a.m. UTC
We needed to get an md5 checksum of an rbd image. Since librbd is using a
lot of sparse operations, this was not possible without writing an image
to a local disk.

With this patch exporting the image is no longer needed. You can do
"rbd md5 image" and you will get the same output as you would call "md5sum".

V1 -> V2:
- use ceph::crypto classes
- support for sha1
- skip holes
---
 src/rbd.cc |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 117 insertions(+), 22 deletions(-)

Comments

Tommi Virtanen Aug. 25, 2011, 8:32 p.m. UTC | #1
On Thu, Aug 25, 2011 at 04:03, Christian Brunner <chb@muc.de> wrote:
> +  if (buf) {
> +    dif = ofs-lastofs;
> +    if (dif > 0) {
> +      byte *tempbuf = (byte *) malloc(dif);
> +      memset(tempbuf, 0, dif);
> +      Hash->Update((const byte *) tempbuf, dif);
> +      free(tempbuf);
> +    }
> +
> +    Hash->Update((const byte *) buf, len);
> +    lastofs = ofs + len;
> +  }

Does this mean a file with a 100GB hole in it will make you malloc(100GB)?

> +  case OPT_MD5:
> +  case OPT_SHA1:
> +    r = do_hash(image, imgname, opt_cmd);
> +    if (r < 0) {
> +      cerr << "md5 hashing failed: " << strerror(-r) << std::endl;
> +      exit(1);

It's not always md5 hashing, yet the error message says md5..
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yehuda Sadeh Weinraub Aug. 25, 2011, 8:38 p.m. UTC | #2
On Thu, Aug 25, 2011 at 1:32 PM, Tommi Virtanen
<tommi.virtanen@dreamhost.com> wrote:
> On Thu, Aug 25, 2011 at 04:03, Christian Brunner <chb@muc.de> wrote:
>> +  if (buf) {
>> +    dif = ofs-lastofs;
>> +    if (dif > 0) {
>> +      byte *tempbuf = (byte *) malloc(dif);
>> +      memset(tempbuf, 0, dif);
>> +      Hash->Update((const byte *) tempbuf, dif);
>> +      free(tempbuf);
>> +    }
>> +
>> +    Hash->Update((const byte *) buf, len);
>> +    lastofs = ofs + len;
>> +  }
>
> Does this mean a file with a 100GB hole in it will make you malloc(100GB)?
>
That's in the read_iterate() callback. It wouldn't be called with
anything larger than a single block size.

Unless I'm somehow missing, it's still missing the holes. Now it just
ignores them, it should be taking them into account as zero data.


Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tommi Virtanen Aug. 25, 2011, 9:05 p.m. UTC | #3
On Thu, Aug 25, 2011 at 13:38, Yehuda Sadeh Weinraub
<yehuda.sadeh@dreamhost.com> wrote:
> On Thu, Aug 25, 2011 at 1:32 PM, Tommi Virtanen
> <tommi.virtanen@dreamhost.com> wrote:
>> On Thu, Aug 25, 2011 at 04:03, Christian Brunner <chb@muc.de> wrote:
>>> +  if (buf) {
>>> +    dif = ofs-lastofs;
>>> +    if (dif > 0) {
>>> +      byte *tempbuf = (byte *) malloc(dif);
>>> +      memset(tempbuf, 0, dif);
>>> +      Hash->Update((const byte *) tempbuf, dif);
>>> +      free(tempbuf);
>>> +    }
>>> +
>>> +    Hash->Update((const byte *) buf, len);
>>> +    lastofs = ofs + len;
>>> +  }
>>
>> Does this mean a file with a 100GB hole in it will make you malloc(100GB)?
>>
> That's in the read_iterate() callback. It wouldn't be called with
> anything larger than a single block size.
>
> Unless I'm somehow missing, it's still missing the holes. Now it just
> ignores them, it should be taking them into account as zero data.

To my reading, that is *exactly* what the quoted code does -- it just
assumes that a hole can fit fully in RAM.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yehuda Sadeh Weinraub Aug. 25, 2011, 9:18 p.m. UTC | #4
On Thu, Aug 25, 2011 at 2:05 PM, Tommi Virtanen
<tommi.virtanen@dreamhost.com> wrote:
> On Thu, Aug 25, 2011 at 13:38, Yehuda Sadeh Weinraub
> <yehuda.sadeh@dreamhost.com> wrote:
>> On Thu, Aug 25, 2011 at 1:32 PM, Tommi Virtanen
>> <tommi.virtanen@dreamhost.com> wrote:
>>> On Thu, Aug 25, 2011 at 04:03, Christian Brunner <chb@muc.de> wrote:
>>>> +  if (buf) {
>>>> +    dif = ofs-lastofs;
>>>> +    if (dif > 0) {
>>>> +      byte *tempbuf = (byte *) malloc(dif);
>>>> +      memset(tempbuf, 0, dif);
>>>> +      Hash->Update((const byte *) tempbuf, dif);
>>>> +      free(tempbuf);
>>>> +    }
>>>> +
>>>> +    Hash->Update((const byte *) buf, len);
>>>> +    lastofs = ofs + len;
>>>> +  }
>>>
>>> Does this mean a file with a 100GB hole in it will make you malloc(100GB)?
>>>
>> That's in the read_iterate() callback. It wouldn't be called with
>> anything larger than a single block size.
>>
>> Unless I'm somehow missing, it's still missing the holes. Now it just
>> ignores them, it should be taking them into account as zero data.
>
> To my reading, that is *exactly* what the quoted code does -- it just
> assumes that a hole can fit fully in RAM.
>
Oh, right, misread that snippet. The test should go the other way
around; something like:

byte *hashbuf = buf;
byte *tempbuf = NULL;
if (!buf) {
  tempbuf = (byte *) calloc(len);
  if (!tempbuf)
    return -ENOMEM;
  hashbuf = tempbuf;
}
Hash->Update((const byte *) hashbuf, len);
free(tempbuf);
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Brunner Aug. 26, 2011, 6:49 p.m. UTC | #5
2011/8/25 Yehuda Sadeh Weinraub <yehuda.sadeh@dreamhost.com>:
>>>>> +  if (buf) {
>>>>> +    dif = ofs-lastofs;
>>>>> +    if (dif > 0) {
>>>>> +      byte *tempbuf = (byte *) malloc(dif);
>>>>> +      memset(tempbuf, 0, dif);
>>>>> +      Hash->Update((const byte *) tempbuf, dif);
>>>>> +      free(tempbuf);
>>>>> +    }
>>>>> +
>>>>> +    Hash->Update((const byte *) buf, len);
>>>>> +    lastofs = ofs + len;
>>>>> +  }
>>>>
>>>> Does this mean a file with a 100GB hole in it will make you malloc(100GB)?
>>>>
>>> That's in the read_iterate() callback. It wouldn't be called with
>>> anything larger than a single block size.
>>>
>>> Unless I'm somehow missing, it's still missing the holes. Now it just
>>> ignores them, it should be taking them into account as zero data.
>>
>> To my reading, that is *exactly* what the quoted code does -- it just
>> assumes that a hole can fit fully in RAM.
>>
> Oh, right, misread that snippet. The test should go the other way
> around; something like:
>
> byte *hashbuf = buf;
> byte *tempbuf = NULL;
> if (!buf) {
>  tempbuf = (byte *) calloc(len);
>  if (!tempbuf)
>    return -ENOMEM;
>  hashbuf = tempbuf;
> }
> Hash->Update((const byte *) hashbuf, len);
> free(tempbuf);

Of course you are right. Just skipping the holes would possibly lead
to large allocations, which is not a good idea. I will send an updated
patch...

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/rbd.cc b/src/rbd.cc
index a18c029..a947309 100644
--- a/src/rbd.cc
+++ b/src/rbd.cc
@@ -38,6 +38,8 @@ 
 #include <time.h>
 #include <sys/ioctl.h>
 
+#include "common/ceph_crypto.h"
+
 #include "include/rbd_types.h"
 
 #include <linux/fs.h>
@@ -49,6 +51,30 @@ 
 
 static string dir_oid = RBD_DIRECTORY;
 static string dir_info_oid = RBD_INFO;
+static size_t lastofs;
+
+enum {
+  OPT_NO_CMD = 0,
+  OPT_LIST,
+  OPT_INFO,
+  OPT_CREATE,
+  OPT_RESIZE,
+  OPT_RM,
+  OPT_EXPORT,
+  OPT_IMPORT,
+  OPT_COPY,
+  OPT_RENAME,
+  OPT_SNAP_CREATE,
+  OPT_SNAP_ROLLBACK,
+  OPT_SNAP_REMOVE,
+  OPT_SNAP_LIST,
+  OPT_WATCH,
+  OPT_MAP,
+  OPT_UNMAP,
+  OPT_SHOWMAPPED,
+  OPT_MD5,
+  OPT_SHA1,
+};
 
 void usage()
 {
@@ -65,6 +91,8 @@  void usage()
        << "  export [image-name] [dest-path]           export image to file\n"
        << "  import [path] [dst-image]                 import image from file (dest defaults\n"
        << "                                            as the filename part of file)\n"
+       << "  md5 [image-name]                          print md5 checksum for image\n"
+       << "  sha1 [image-name]                         print sha1 checksum for image\n"
        << "  <cp | copy> [src-image] [dest-image]      copy image to dest\n"
        << "  <mv | rename> [src-image] [dest-image]    copy image to dest\n"
        << "  snap ls [image-name]                      dump list of image snapshots\n"
@@ -262,6 +290,77 @@  static int do_export(librbd::Image& image, const char *path)
   return 0;
 }
 
+static int hash_read_cb(uint64_t ofs, size_t len, const char *buf, void *arg)
+{
+  ssize_t dif;
+  ceph::crypto::Digest *Hash = (ceph::crypto::Digest *)arg;
+
+  if (buf) {
+    dif = ofs-lastofs;
+    if (dif > 0) {
+      byte *tempbuf = (byte *) malloc(dif);
+      memset(tempbuf, 0, dif);
+      Hash->Update((const byte *) tempbuf, dif);
+      free(tempbuf);
+    }
+
+    Hash->Update((const byte *) buf, len);
+    lastofs = ofs + len;
+  }
+
+  return 0;
+}
+
+static int do_hash(librbd::Image& image, const char *imgname, int opt_cmd)
+{
+  int64_t r, i, digest_size;
+  byte *digest;
+  char hexval[] = "0123456789abcdef";
+  char *hexdigest;
+  librbd::image_info_t info;
+  ceph::crypto::Digest *Hash;
+
+  lastofs = 0;
+
+  if (opt_cmd == OPT_MD5) {
+    Hash = (ceph::crypto::Digest *) new ceph::crypto::MD5;
+  } else if (opt_cmd == OPT_SHA1) {
+    Hash = (ceph::crypto::Digest *) new ceph::crypto::SHA1;
+  } else {
+    return -1;
+  }
+
+  r = image.stat(info, sizeof(info));
+  if (r < 0)
+    return r;
+
+  r = image.read_iterate(0, info.size, hash_read_cb, (void *)Hash);
+  if (r < 0)
+    return r;
+
+  if (lastofs < info.size) {
+      hash_read_cb(info.size, 0, NULL, (void *)Hash);
+  }
+
+  digest_size = Hash->DigestSize();
+  digest = (byte *) malloc(digest_size);
+  hexdigest = (char *) malloc((digest_size * 2 + 1) * sizeof(char));
+  Hash->Final(digest);
+
+  for(i = 0; i < digest_size; i++){
+    hexdigest[i*2] = hexval[((digest[i] >> 4) & 0xF)];
+    hexdigest[(i*2) + 1] = hexval[(digest[i]) & 0x0F];
+  }
+  hexdigest[(digest_size*2)] = '\0';
+
+  cout << hexdigest << "  " << imgname << std::endl;
+
+  free(hexdigest);
+  free(digest);
+
+  return 0;
+}
+
 static const char *imgname_from_path(const char *path)
 {
   const char *imgname;
@@ -720,27 +819,6 @@  static int do_kernel_rm(const char *dev)
   return r;
 }
 
-enum {
-  OPT_NO_CMD = 0,
-  OPT_LIST,
-  OPT_INFO,
-  OPT_CREATE,
-  OPT_RESIZE,
-  OPT_RM,
-  OPT_EXPORT,
-  OPT_IMPORT,
-  OPT_COPY,
-  OPT_RENAME,
-  OPT_SNAP_CREATE,
-  OPT_SNAP_ROLLBACK,
-  OPT_SNAP_REMOVE,
-  OPT_SNAP_LIST,
-  OPT_WATCH,
-  OPT_MAP,
-  OPT_UNMAP,
-  OPT_SHOWMAPPED,
-};
-
 static int get_cmd(const char *cmd, bool *snapcmd)
 {
   if (strcmp(cmd, "snap") == 0) {
@@ -766,6 +844,10 @@  static int get_cmd(const char *cmd, bool *snapcmd)
       return OPT_EXPORT;
     if (strcmp(cmd, "import") == 0)
       return OPT_IMPORT;
+    if (strcmp(cmd, "md5") == 0)
+      return OPT_MD5;
+    if (strcmp(cmd, "sha1") == 0)
+      return OPT_SHA1;
     if (strcmp(cmd, "copy") == 0 ||
         strcmp(cmd, "cp") == 0)
       return OPT_COPY;
@@ -888,6 +970,10 @@  int main(int argc, const char **argv)
           case OPT_IMPORT:
             set_conf_param(CEPH_ARGPARSE_VAL, &path, &destname);
             break;
+          case OPT_MD5:
+          case OPT_SHA1:
+            set_conf_param(CEPH_ARGPARSE_VAL, &imgname, NULL);
+            break;
           case OPT_COPY:
           case OPT_RENAME:
             set_conf_param(CEPH_ARGPARSE_VAL, &imgname, &destname);
@@ -966,7 +1052,7 @@  int main(int argc, const char **argv)
       (opt_cmd == OPT_RESIZE || opt_cmd == OPT_INFO || opt_cmd == OPT_SNAP_LIST ||
        opt_cmd == OPT_SNAP_CREATE || opt_cmd == OPT_SNAP_ROLLBACK ||
        opt_cmd == OPT_SNAP_REMOVE || opt_cmd == OPT_EXPORT || opt_cmd == OPT_WATCH ||
-       opt_cmd == OPT_COPY)) {
+       opt_cmd == OPT_COPY || opt_cmd == OPT_MD5 || opt_cmd == OPT_SHA1 )) {
     r = rbd.open(io_ctx, image, imgname);
     if (r < 0) {
       cerr << "error opening image " << imgname << ": " << strerror(r) << std::endl;
@@ -1127,6 +1213,15 @@  int main(int argc, const char **argv)
     }
     break;
 
+  case OPT_MD5:
+  case OPT_SHA1:
+    r = do_hash(image, imgname, opt_cmd);
+    if (r < 0) {
+      cerr << "md5 hashing failed: " << strerror(-r) << std::endl;
+      exit(1);
+    }
+    break;
+
   case OPT_COPY:
     r = do_copy(image, dest_io_ctx, destname);
     if (r < 0) {