diff mbox

[RFC] external backup api

Message ID 56BA0BA0.2060302@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vladimir Sementsov-Ogievskiy Feb. 9, 2016, 3:54 p.m. UTC
On 09.02.2016 00:14, John Snow wrote:
>
> On 02/06/2016 04:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 05.02.2016 22:48, John Snow wrote:
>>> On 01/22/2016 12:07 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi all.
>>>>
>>>> This is the early begin of the series which aims to add external backup
>>>> api. This is needed to allow backup software use our dirty bitmaps.
>>>>
>>>> Vmware and Parallels Cloud Server have this feature.
>>>>
>>> Have a link to the equivalent feature that VMWare exposes? (Or Parallels
>>> Cloud Server) ... I'm curious about what the API there looks like.
>> For VMware you need their Virtual Disk Api Programming Guide
>> http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vddk60_programming.pdf
>>
> Great, thanks!
>
>> Look at Changed Block Tracking (CBT) , Backup and Restore.
>>
>> For PCS here is part of SDK header, related to the topic:
>>
>> ====================================
>> /*
>>   * Builds a map of the disk contents changes between 2 PITs.
>>     Parameters
>>     hDisk :       A handle of type PHT_VIRTUAL_DISK identifying
>>                   the virtual disk.
>>     sPit1Uuid :   Uuid of the older PIT.
>>     sPit2Uuid :   Uuid of the later PIT.
>>     phMap :       A pointer to a variable which receives the
>>           result (a handle of type PHT_VIRTUAL_DISK_MAP).
>>     Returns
>>     PRL_RESULT.
>> */
>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>                  PrlDisk_GetChangesMap_Local, (
>>          PRL_HANDLE hDisk,
>>          PRL_CONST_STR sPit1Uuid,
>>          PRL_CONST_STR sPit2Uuid,
>>          PRL_HANDLE_PTR phMap) );
>>
> Effectively giving you a dirty bitmap diff between two snapshots.
> Something we don't currently genuinely support in QEMU.

Just start dirty bitmap at point a and stop at point b..

>
>> /*
>>   * Reports the number of significant bits in the map.
>>     Parameters
>>     hMap :        A handle of type PHT_VIRTUAL_DISK_MAP identifying
>>                   the changes map.
>>     phSize :      A pointer to a variable which receives the
>>           result.
>>     Returns
>>     PRL_RESULT.
>> */
>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>                  PrlDiskMap_GetSize, (
>>          PRL_HANDLE hMap,
>>          PRL_UINT32_PTR pnSize) );
>>
> I assume this is roughly the dirty bit count, for us, this would be
> dirty clusters. (Or whatever granularity you specified, but usually
> clusters.)
>
>> /*
>>   * Reports the size (in bytes) of a block mapped by a single bit
>>   * in the map.
>>     Parameters
>>     hMap :        A handle of type PHT_VIRTUAL_DISK_MAP identifying
>>                   the changes map.
>>     phSize :      A pointer to a variable which receives the
>>           result.
>>     Returns
>>     PRL_RESULT.
>> */
>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>                  PrlDiskMap_GetGranularity, (
>>          PRL_HANDLE hMap,
>>          PRL_UINT32_PTR pnSize) );
>>
> Basically a granularity query.
>
>> /*
>>   * Returns bits from the blocks map.
>>     Parameters
>>     hMap :        A handle of type PHT_VIRTUAL_DISK_MAP identifying
>>                   the changes map.
>>     pBuffer :     A pointer to a store.
>>     pnCapacity :  A pointer to a variable holding the size
>>           of the buffer and receiving the number of
>>           bytes actually written.
>>     Returns
>>     PRL_RESULT.
>> */
>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>                  PrlDiskMap_Read, (
>>          PRL_HANDLE hMap,
>>          PRL_VOID_PTR pBuffer,
>>          PRL_UINT32_PTR pnCapacity) );
>>
> And this would be a direct bitmap query.
>
> Is the expected usage here that the third party client will use this
> bitmap to read the source image? Or do you query for the data from API?

- from API.

>
> I think the thought among block devs would be to opt for more of the
> second option, and less allowing clients to directly interface with the
> image files.
>
>> =======================================
>>
>>
>>>> There is only one patch here, about querying dirty bitmap from qemu by
>>>> qmp command. It is just an updated and clipped (hmp command removed) old
>>>> my patch "[PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap".
>>>>
>>>> Before writing the whole thing I'd like to discuss the details. Or, may
>>>> be there are existing plans on this topic, or may be someone already
>>>> works on it?
>>>>
>>>> I see it like this:
>>>>
>>>> =====
>>>>
>>>> - add qmp commands for dirty-bitmap functions: create_successor,
>>>> abdicate,
>>>> reclaime.
>>> Hm, why do we need such low-level control over splitting and merging
>>> bitmaps from an external client?
>>>
>>>> - make create-successor command transaction-able
>>>> - add query-block-dirty-bitmap qmp command
>>>>
>>>> then, external backup:
>>>>
>>>> qmp transaction {
>>>>       external-snapshot
>>>>       bitmap-create-successor
>>>> }
>>>>
>>>> qmp query frozen bitmap, not acquiring aio context.
>>>>
>>>> do external backup, using snapshot and bitmap
>>>>
>>>> if (success backup)
>>>>       qmp bitmap-abdicate
>>>> else
>>>>       qmp bitmap-reclaime
>>>>
>>>> qmp merge snapshot
>>>> =====
>>>>
>>> Hm, I see -- so you're hoping to manage the backup *entirely*
>>> externally, so you want to be able to reach inside of QEMU and control
>>> some status conditions to guarantee it'll be safe.
>>>
>>> I'm not convinced QEMU can guarantee such things -- due to various flush
>>> properties, race conditions on write, etc. QEMU handles all of this
>>> internally in a non-public way at the moment.
>> Hm, can you be more concrete? What operations are dangerous? We can do
>> them in paused state for example.
>>
> I suppose if you're going to pause the VM, then it should be reasonably
> safe, but recently there have been endeavors to augment the .qcow2
> format to prohibit concurrent access, which might include a paused VM as
> well, I'm not clear on the implementation.
>
> If you do it via paused only, then you also don't need to expose the
> freeze/rollback mechanisms: the existing clear mechanism alone is
> sufficient:
>
> (A) The frozen backup fails. Nothing new has been written, so we don't
> need to adjust anything, we can just try again.
> (B) The frozen backup succeeds. We can just clear the bitmap before
> unfreezing.

We can't query bitmap in paused state - it may take too much time.

>
> I definitely have reservations about using this as a live fleecing
> mechanism -- the backup block job uses a write-notifier to make
> just-in-time backups of data before it is altered, leaving it the only
> "safe" live backup mechanism in QEMU currently. (Alongside mirror.)
>
> I actually have some patches from Fam to introduce a live fleecing
> mechanism into QEMU (The idea being you create a point-in-time drive you
> can get data from via NBD, then delete it when done) that might be more
> appropriate, but I ran into a lot of problems with the patch. I'll post
> the WIP for that patch to try to solicit comments on the best way forward.

After adding
  }
==============
and tiny fix for qemu_io interface in iotest

Fam's "qemu-iotests: Image fleecing test case 089" works for me. Isn't 
it enough?



>
> Otherwise, My biggest question here is:
> "What does fleecing a backup externally provide as a benefit over
> backing up to an NBD target?"

Look at our answers on v2 of these series:

On 05.02.2016 11:28, Denis V. Lunev wrote:
> On 02/03/2016 11:14 AM, Fam Zheng wrote:
>> On Sat, 01/30 13:56, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all.
>>>
>>> These series which aims to add external backup api. This is needed 
>>> to allow
>>> backup software use our dirty bitmaps.
>>>
>>> Vmware and Parallels Cloud Server have this feature.
>> What is the advantage of this appraoch over "drive-backup 
>> sync=incremental
>> ..."?
>
> This will allow third-party vendors to backup QEMU VMs into
> their own formats or to the cloud etc.


>
> You can already today perform incremental backups to an NBD target to
> copy the data out via an external mechanism, is this not sufficient for
> Parallels? If not, why?
>
>>>> In the following patch query-bitmap acquires aio context. This must be
>>>> ofcourse dropped for frozen bitmap.
>>>> But to make it in true way, I think, I should check somehow that this is
>>>> not just frozen bitmap, but the bitmap frozen by qmp command, to avoid
>>>> incorrect quering of bitmap frozen by internal backup (or other
>>>> mechanizm).. May be, it is not necessary.
>>>>
>>>>
>>>>
>>>

Comments

John Snow Feb. 9, 2016, 4:51 p.m. UTC | #1
On 02/09/2016 10:54 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 09.02.2016 00:14, John Snow wrote:
>>
>> On 02/06/2016 04:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 05.02.2016 22:48, John Snow wrote:
>>>> On 01/22/2016 12:07 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Hi all.
>>>>>
>>>>> This is the early begin of the series which aims to add external
>>>>> backup
>>>>> api. This is needed to allow backup software use our dirty bitmaps.
>>>>>
>>>>> Vmware and Parallels Cloud Server have this feature.
>>>>>
>>>> Have a link to the equivalent feature that VMWare exposes? (Or
>>>> Parallels
>>>> Cloud Server) ... I'm curious about what the API there looks like.
>>> For VMware you need their Virtual Disk Api Programming Guide
>>> http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vddk60_programming.pdf
>>>
>>>
>> Great, thanks!
>>
>>> Look at Changed Block Tracking (CBT) , Backup and Restore.
>>>
>>> For PCS here is part of SDK header, related to the topic:
>>>
>>> ====================================
>>> /*
>>>   * Builds a map of the disk contents changes between 2 PITs.
>>>     Parameters
>>>     hDisk :       A handle of type PHT_VIRTUAL_DISK identifying
>>>                   the virtual disk.
>>>     sPit1Uuid :   Uuid of the older PIT.
>>>     sPit2Uuid :   Uuid of the later PIT.
>>>     phMap :       A pointer to a variable which receives the
>>>           result (a handle of type PHT_VIRTUAL_DISK_MAP).
>>>     Returns
>>>     PRL_RESULT.
>>> */
>>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>>                  PrlDisk_GetChangesMap_Local, (
>>>          PRL_HANDLE hDisk,
>>>          PRL_CONST_STR sPit1Uuid,
>>>          PRL_CONST_STR sPit2Uuid,
>>>          PRL_HANDLE_PTR phMap) );
>>>
>> Effectively giving you a dirty bitmap diff between two snapshots.
>> Something we don't currently genuinely support in QEMU.
> 
> Just start dirty bitmap at point a and stop at point b..
> 
>>
>>> /*
>>>   * Reports the number of significant bits in the map.
>>>     Parameters
>>>     hMap :        A handle of type PHT_VIRTUAL_DISK_MAP identifying
>>>                   the changes map.
>>>     phSize :      A pointer to a variable which receives the
>>>           result.
>>>     Returns
>>>     PRL_RESULT.
>>> */
>>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>>                  PrlDiskMap_GetSize, (
>>>          PRL_HANDLE hMap,
>>>          PRL_UINT32_PTR pnSize) );
>>>
>> I assume this is roughly the dirty bit count, for us, this would be
>> dirty clusters. (Or whatever granularity you specified, but usually
>> clusters.)
>>
>>> /*
>>>   * Reports the size (in bytes) of a block mapped by a single bit
>>>   * in the map.
>>>     Parameters
>>>     hMap :        A handle of type PHT_VIRTUAL_DISK_MAP identifying
>>>                   the changes map.
>>>     phSize :      A pointer to a variable which receives the
>>>           result.
>>>     Returns
>>>     PRL_RESULT.
>>> */
>>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>>                  PrlDiskMap_GetGranularity, (
>>>          PRL_HANDLE hMap,
>>>          PRL_UINT32_PTR pnSize) );
>>>
>> Basically a granularity query.
>>
>>> /*
>>>   * Returns bits from the blocks map.
>>>     Parameters
>>>     hMap :        A handle of type PHT_VIRTUAL_DISK_MAP identifying
>>>                   the changes map.
>>>     pBuffer :     A pointer to a store.
>>>     pnCapacity :  A pointer to a variable holding the size
>>>           of the buffer and receiving the number of
>>>           bytes actually written.
>>>     Returns
>>>     PRL_RESULT.
>>> */
>>> PRL_METHOD_DECL( PARALLELS_API_VER_5,
>>>                  PrlDiskMap_Read, (
>>>          PRL_HANDLE hMap,
>>>          PRL_VOID_PTR pBuffer,
>>>          PRL_UINT32_PTR pnCapacity) );
>>>
>> And this would be a direct bitmap query.
>>
>> Is the expected usage here that the third party client will use this
>> bitmap to read the source image? Or do you query for the data from API?
> 
> - from API.
> 
>>
>> I think the thought among block devs would be to opt for more of the
>> second option, and less allowing clients to directly interface with the
>> image files.
>>
>>> =======================================
>>>
>>>
>>>>> There is only one patch here, about querying dirty bitmap from qemu by
>>>>> qmp command. It is just an updated and clipped (hmp command
>>>>> removed) old
>>>>> my patch "[PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap".
>>>>>
>>>>> Before writing the whole thing I'd like to discuss the details. Or,
>>>>> may
>>>>> be there are existing plans on this topic, or may be someone already
>>>>> works on it?
>>>>>
>>>>> I see it like this:
>>>>>
>>>>> =====
>>>>>
>>>>> - add qmp commands for dirty-bitmap functions: create_successor,
>>>>> abdicate,
>>>>> reclaime.
>>>> Hm, why do we need such low-level control over splitting and merging
>>>> bitmaps from an external client?
>>>>
>>>>> - make create-successor command transaction-able
>>>>> - add query-block-dirty-bitmap qmp command
>>>>>
>>>>> then, external backup:
>>>>>
>>>>> qmp transaction {
>>>>>       external-snapshot
>>>>>       bitmap-create-successor
>>>>> }
>>>>>
>>>>> qmp query frozen bitmap, not acquiring aio context.
>>>>>
>>>>> do external backup, using snapshot and bitmap
>>>>>
>>>>> if (success backup)
>>>>>       qmp bitmap-abdicate
>>>>> else
>>>>>       qmp bitmap-reclaime
>>>>>
>>>>> qmp merge snapshot
>>>>> =====
>>>>>
>>>> Hm, I see -- so you're hoping to manage the backup *entirely*
>>>> externally, so you want to be able to reach inside of QEMU and control
>>>> some status conditions to guarantee it'll be safe.
>>>>
>>>> I'm not convinced QEMU can guarantee such things -- due to various
>>>> flush
>>>> properties, race conditions on write, etc. QEMU handles all of this
>>>> internally in a non-public way at the moment.
>>> Hm, can you be more concrete? What operations are dangerous? We can do
>>> them in paused state for example.
>>>
>> I suppose if you're going to pause the VM, then it should be reasonably
>> safe, but recently there have been endeavors to augment the .qcow2
>> format to prohibit concurrent access, which might include a paused VM as
>> well, I'm not clear on the implementation.
>>
>> If you do it via paused only, then you also don't need to expose the
>> freeze/rollback mechanisms: the existing clear mechanism alone is
>> sufficient:
>>
>> (A) The frozen backup fails. Nothing new has been written, so we don't
>> need to adjust anything, we can just try again.
>> (B) The frozen backup succeeds. We can just clear the bitmap before
>> unfreezing.
> 
> We can't query bitmap in paused state - it may take too much time.
> 

And I think it is currently unsafe to fetch the data from disk while the
VM is running, so you'll have to solve one or the other problem...

>>
>> I definitely have reservations about using this as a live fleecing
>> mechanism -- the backup block job uses a write-notifier to make
>> just-in-time backups of data before it is altered, leaving it the only
>> "safe" live backup mechanism in QEMU currently. (Alongside mirror.)
>>
>> I actually have some patches from Fam to introduce a live fleecing
>> mechanism into QEMU (The idea being you create a point-in-time drive you
>> can get data from via NBD, then delete it when done) that might be more
>> appropriate, but I ran into a lot of problems with the patch. I'll post
>> the WIP for that patch to try to solicit comments on the best way
>> forward.
> 
> After adding
> =============
> --- a/block.c
> +++ b/block.c
> @@ -1276,6 +1276,9 @@ void bdrv_set_backing_hd(BlockDriverState *bs,
> BlockDriverState *backing_hd)
>      /* Otherwise we won't be able to commit due to check in bdrv_commit */
>      bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
>                      bs->backing_blocker);
> +
> +    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
> +                    bs->backing_blocker);
>  out:
>      bdrv_refresh_limits(bs, NULL);
>  }
> ==============
> and tiny fix for qemu_io interface in iotest
> 
> Fam's "qemu-iotests: Image fleecing test case 089" works for me. Isn't
> it enough?
> 
> 
> 
>>
>> Otherwise, My biggest question here is:
>> "What does fleecing a backup externally provide as a benefit over
>> backing up to an NBD target?"
> 
> Look at our answers on v2 of these series:
> 
> On 05.02.2016 11:28, Denis V. Lunev wrote:
>> On 02/03/2016 11:14 AM, Fam Zheng wrote:
>>> On Sat, 01/30 13:56, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi all.
>>>>
>>>> These series which aims to add external backup api. This is needed
>>>> to allow
>>>> backup software use our dirty bitmaps.
>>>>
>>>> Vmware and Parallels Cloud Server have this feature.
>>> What is the advantage of this appraoch over "drive-backup
>>> sync=incremental
>>> ..."?
>>
>> This will allow third-party vendors to backup QEMU VMs into
>> their own formats or to the cloud etc.
> 
> 
>>
>> You can already today perform incremental backups to an NBD target to
>> copy the data out via an external mechanism, is this not sufficient for
>> Parallels? If not, why?
>>
>>>>> In the following patch query-bitmap acquires aio context. This must be
>>>>> ofcourse dropped for frozen bitmap.
>>>>> But to make it in true way, I think, I should check somehow that
>>>>> this is
>>>>> not just frozen bitmap, but the bitmap frozen by qmp command, to avoid
>>>>> incorrect quering of bitmap frozen by internal backup (or other
>>>>> mechanizm).. May be, it is not necessary.
>>>>>
>>>>>
>>>>>
>>>>
> 
>
diff mbox

Patch

=============
--- a/block.c
+++ b/block.c
@@ -1276,6 +1276,9 @@  void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
      /* Otherwise we won't be able to commit due to check in bdrv_commit */
      bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
                      bs->backing_blocker);
+
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                    bs->backing_blocker);
  out:
      bdrv_refresh_limits(bs, NULL);