mbox series

[0/2] block layer filter and block device snapshot module

Message ID 1603271049-20681-1-git-send-email-sergei.shtepa@veeam.com (mailing list archive)
Headers show
Series block layer filter and block device snapshot module | expand

Message

Sergei Shtepa Oct. 21, 2020, 9:04 a.m. UTC
Hello everyone! Requesting for your comments and suggestions.

# blk-filter

Block layer filter allows to intercept BIO requests to a block device.

Interception is performed at the very beginning of the BIO request
processing, and therefore does not affect the operation of the request
processing queue. This also makes it possible to intercept requests from
a specific block device, rather than from the entire disk.

The logic of the submit_bio function has been changed - since the
function execution results are not processed anywhere (except for swap
and direct-io) the function won't return a value anymore.

Now the submit_bio_direct() function is called whenever the result of
the blk_qc_t function is required. submit_bio_direct() is not
intercepted by the block layer filter. This is logical for swap and
direct-io.

Block layer filter allows you to enable and disable the filter driver on
the fly. When a new block device is added, the filter driver can start
filtering this device. When you delete a device, the filter can remove
its own filter.

The idea of multiple altitudes had to be abandoned in order to simplify
implementation and make it more reliable. Different filter drivers can
work simultaneously, but each on its own block device.

# blk-snap

We propose a new kernel module - blk-snap. This module implements
snapshot and changed block tracking functionality. It is intended to
create backup copies of any block devices without usage of device mapper.
Snapshots are temporary and are destroyed after the backup process has
finished. Changed block tracking allows for incremental and differential
backup copies.

blk-snap uses block layer filter. Block layer filter provides a callback
to intercept bio-requests. If a block device disappears for whatever
reason, send a synchronous request to remove the device from filtering.

blk-snap kernel module is a product of a deep refactoring of the
out-of-tree kernel veeamsnap kernel module
(https://github.com/veeam/veeamsnap/):
* all conditional compilation branches that served for the purpose of
  compatibility with older kernels have been removed;
* linux kernel code style has been applied;
* blk-snap mostly takes advantage of the existing kernel code instead of
  reinventing the wheel;
* all redundant code (such as persistent cbt and snapstore collector)
  has been removed.

Several important things are still have to be done:
* refactor the module interface for interaction with a user-space code -
  it is already clear that the implementation of some calls can be
  improved.

Your feedback would be greatly appreciated!

Sergei Shtepa (2):
  Block layer filter - second version
  blk-snap - snapshots and change-tracking for block devices

 block/Kconfig                               |  11 +
 block/Makefile                              |   1 +
 block/blk-core.c                            |  52 +-
 block/blk-filter-internal.h                 |  29 +
 block/blk-filter.c                          | 286 ++++++
 block/partitions/core.c                     |  14 +-
 drivers/block/Kconfig                       |   2 +
 drivers/block/Makefile                      |   1 +
 drivers/block/blk-snap/Kconfig              |  24 +
 drivers/block/blk-snap/Makefile             |  28 +
 drivers/block/blk-snap/big_buffer.c         | 193 ++++
 drivers/block/blk-snap/big_buffer.h         |  24 +
 drivers/block/blk-snap/blk-snap-ctl.h       | 190 ++++
 drivers/block/blk-snap/blk_deferred.c       | 566 +++++++++++
 drivers/block/blk-snap/blk_deferred.h       |  67 ++
 drivers/block/blk-snap/blk_descr_file.c     |  82 ++
 drivers/block/blk-snap/blk_descr_file.h     |  26 +
 drivers/block/blk-snap/blk_descr_mem.c      |  66 ++
 drivers/block/blk-snap/blk_descr_mem.h      |  14 +
 drivers/block/blk-snap/blk_descr_multidev.c |  86 ++
 drivers/block/blk-snap/blk_descr_multidev.h |  25 +
 drivers/block/blk-snap/blk_descr_pool.c     | 190 ++++
 drivers/block/blk-snap/blk_descr_pool.h     |  38 +
 drivers/block/blk-snap/blk_redirect.c       | 507 ++++++++++
 drivers/block/blk-snap/blk_redirect.h       |  73 ++
 drivers/block/blk-snap/blk_util.c           |  33 +
 drivers/block/blk-snap/blk_util.h           |  33 +
 drivers/block/blk-snap/cbt_map.c            | 210 +++++
 drivers/block/blk-snap/cbt_map.h            |  62 ++
 drivers/block/blk-snap/common.h             |  31 +
 drivers/block/blk-snap/ctrl_fops.c          | 691 ++++++++++++++
 drivers/block/blk-snap/ctrl_fops.h          |  19 +
 drivers/block/blk-snap/ctrl_pipe.c          | 562 +++++++++++
 drivers/block/blk-snap/ctrl_pipe.h          |  34 +
 drivers/block/blk-snap/ctrl_sysfs.c         |  73 ++
 drivers/block/blk-snap/ctrl_sysfs.h         |   5 +
 drivers/block/blk-snap/defer_io.c           | 397 ++++++++
 drivers/block/blk-snap/defer_io.h           |  39 +
 drivers/block/blk-snap/main.c               |  82 ++
 drivers/block/blk-snap/params.c             |  58 ++
 drivers/block/blk-snap/params.h             |  29 +
 drivers/block/blk-snap/rangevector.c        |  85 ++
 drivers/block/blk-snap/rangevector.h        |  31 +
 drivers/block/blk-snap/snapimage.c          | 982 ++++++++++++++++++++
 drivers/block/blk-snap/snapimage.h          |  16 +
 drivers/block/blk-snap/snapshot.c           | 225 +++++
 drivers/block/blk-snap/snapshot.h           |  17 +
 drivers/block/blk-snap/snapstore.c          | 929 ++++++++++++++++++
 drivers/block/blk-snap/snapstore.h          |  68 ++
 drivers/block/blk-snap/snapstore_device.c   | 532 +++++++++++
 drivers/block/blk-snap/snapstore_device.h   |  63 ++
 drivers/block/blk-snap/snapstore_file.c     |  52 ++
 drivers/block/blk-snap/snapstore_file.h     |  15 +
 drivers/block/blk-snap/snapstore_mem.c      |  91 ++
 drivers/block/blk-snap/snapstore_mem.h      |  20 +
 drivers/block/blk-snap/snapstore_multidev.c | 118 +++
 drivers/block/blk-snap/snapstore_multidev.h |  22 +
 drivers/block/blk-snap/tracker.c            | 449 +++++++++
 drivers/block/blk-snap/tracker.h            |  38 +
 drivers/block/blk-snap/tracking.c           | 270 ++++++
 drivers/block/blk-snap/tracking.h           |  13 +
 drivers/block/blk-snap/version.h            |   7 +
 fs/block_dev.c                              |   6 +-
 fs/direct-io.c                              |   2 +-
 fs/iomap/direct-io.c                        |   2 +-
 include/linux/bio.h                         |   4 +-
 include/linux/blk-filter.h                  |  76 ++
 include/linux/genhd.h                       |   8 +-
 kernel/power/swap.c                         |   2 +-
 mm/page_io.c                                |   4 +-
 70 files changed, 9074 insertions(+), 26 deletions(-)
 create mode 100644 block/blk-filter-internal.h
 create mode 100644 block/blk-filter.c
 create mode 100644 drivers/block/blk-snap/Kconfig
 create mode 100644 drivers/block/blk-snap/Makefile
 create mode 100644 drivers/block/blk-snap/big_buffer.c
 create mode 100644 drivers/block/blk-snap/big_buffer.h
 create mode 100644 drivers/block/blk-snap/blk-snap-ctl.h
 create mode 100644 drivers/block/blk-snap/blk_deferred.c
 create mode 100644 drivers/block/blk-snap/blk_deferred.h
 create mode 100644 drivers/block/blk-snap/blk_descr_file.c
 create mode 100644 drivers/block/blk-snap/blk_descr_file.h
 create mode 100644 drivers/block/blk-snap/blk_descr_mem.c
 create mode 100644 drivers/block/blk-snap/blk_descr_mem.h
 create mode 100644 drivers/block/blk-snap/blk_descr_multidev.c
 create mode 100644 drivers/block/blk-snap/blk_descr_multidev.h
 create mode 100644 drivers/block/blk-snap/blk_descr_pool.c
 create mode 100644 drivers/block/blk-snap/blk_descr_pool.h
 create mode 100644 drivers/block/blk-snap/blk_redirect.c
 create mode 100644 drivers/block/blk-snap/blk_redirect.h
 create mode 100644 drivers/block/blk-snap/blk_util.c
 create mode 100644 drivers/block/blk-snap/blk_util.h
 create mode 100644 drivers/block/blk-snap/cbt_map.c
 create mode 100644 drivers/block/blk-snap/cbt_map.h
 create mode 100644 drivers/block/blk-snap/common.h
 create mode 100644 drivers/block/blk-snap/ctrl_fops.c
 create mode 100644 drivers/block/blk-snap/ctrl_fops.h
 create mode 100644 drivers/block/blk-snap/ctrl_pipe.c
 create mode 100644 drivers/block/blk-snap/ctrl_pipe.h
 create mode 100644 drivers/block/blk-snap/ctrl_sysfs.c
 create mode 100644 drivers/block/blk-snap/ctrl_sysfs.h
 create mode 100644 drivers/block/blk-snap/defer_io.c
 create mode 100644 drivers/block/blk-snap/defer_io.h
 create mode 100644 drivers/block/blk-snap/main.c
 create mode 100644 drivers/block/blk-snap/params.c
 create mode 100644 drivers/block/blk-snap/params.h
 create mode 100644 drivers/block/blk-snap/rangevector.c
 create mode 100644 drivers/block/blk-snap/rangevector.h
 create mode 100644 drivers/block/blk-snap/snapimage.c
 create mode 100644 drivers/block/blk-snap/snapimage.h
 create mode 100644 drivers/block/blk-snap/snapshot.c
 create mode 100644 drivers/block/blk-snap/snapshot.h
 create mode 100644 drivers/block/blk-snap/snapstore.c
 create mode 100644 drivers/block/blk-snap/snapstore.h
 create mode 100644 drivers/block/blk-snap/snapstore_device.c
 create mode 100644 drivers/block/blk-snap/snapstore_device.h
 create mode 100644 drivers/block/blk-snap/snapstore_file.c
 create mode 100644 drivers/block/blk-snap/snapstore_file.h
 create mode 100644 drivers/block/blk-snap/snapstore_mem.c
 create mode 100644 drivers/block/blk-snap/snapstore_mem.h
 create mode 100644 drivers/block/blk-snap/snapstore_multidev.c
 create mode 100644 drivers/block/blk-snap/snapstore_multidev.h
 create mode 100644 drivers/block/blk-snap/tracker.c
 create mode 100644 drivers/block/blk-snap/tracker.h
 create mode 100644 drivers/block/blk-snap/tracking.c
 create mode 100644 drivers/block/blk-snap/tracking.h
 create mode 100644 drivers/block/blk-snap/version.h
 create mode 100644 include/linux/blk-filter.h

--
2.20.1

Comments

Hannes Reinecke Oct. 21, 2020, 1:31 p.m. UTC | #1
On 10/21/20 11:04 AM, Sergei Shtepa wrote:
> Hello everyone! Requesting for your comments and suggestions.
> 
> # blk-filter
> 
> Block layer filter allows to intercept BIO requests to a block device.
> 
> Interception is performed at the very beginning of the BIO request
> processing, and therefore does not affect the operation of the request
> processing queue. This also makes it possible to intercept requests from
> a specific block device, rather than from the entire disk.
> 
> The logic of the submit_bio function has been changed - since the
> function execution results are not processed anywhere (except for swap
> and direct-io) the function won't return a value anymore.
> 
> Now the submit_bio_direct() function is called whenever the result of
> the blk_qc_t function is required. submit_bio_direct() is not
> intercepted by the block layer filter. This is logical for swap and
> direct-io.
> 
> Block layer filter allows you to enable and disable the filter driver on
> the fly. When a new block device is added, the filter driver can start
> filtering this device. When you delete a device, the filter can remove
> its own filter.
> 
> The idea of multiple altitudes had to be abandoned in order to simplify
> implementation and make it more reliable. Different filter drivers can
> work simultaneously, but each on its own block device.
> 
> # blk-snap
> 
> We propose a new kernel module - blk-snap. This module implements
> snapshot and changed block tracking functionality. It is intended to
> create backup copies of any block devices without usage of device mapper.
> Snapshots are temporary and are destroyed after the backup process has
> finished. Changed block tracking allows for incremental and differential
> backup copies.
> 
> blk-snap uses block layer filter. Block layer filter provides a callback
> to intercept bio-requests. If a block device disappears for whatever
> reason, send a synchronous request to remove the device from filtering.
> 
> blk-snap kernel module is a product of a deep refactoring of the
> out-of-tree kernel veeamsnap kernel module
> (https://github.com/veeam/veeamsnap/):
> * all conditional compilation branches that served for the purpose of
>    compatibility with older kernels have been removed;
> * linux kernel code style has been applied;
> * blk-snap mostly takes advantage of the existing kernel code instead of
>    reinventing the wheel;
> * all redundant code (such as persistent cbt and snapstore collector)
>    has been removed.
> 
> Several important things are still have to be done:
> * refactor the module interface for interaction with a user-space code -
>    it is already clear that the implementation of some calls can be
>    improved.
> 
> Your feedback would be greatly appreciated!
> 
> Sergei Shtepa (2):
>    Block layer filter - second version
>    blk-snap - snapshots and change-tracking for block devices
> 
>   block/Kconfig                               |  11 +
>   block/Makefile                              |   1 +
>   block/blk-core.c                            |  52 +-
>   block/blk-filter-internal.h                 |  29 +
>   block/blk-filter.c                          | 286 ++++++
>   block/partitions/core.c                     |  14 +-
>   drivers/block/Kconfig                       |   2 +
>   drivers/block/Makefile                      |   1 +
>   drivers/block/blk-snap/Kconfig              |  24 +
>   drivers/block/blk-snap/Makefile             |  28 +
>   drivers/block/blk-snap/big_buffer.c         | 193 ++++
>   drivers/block/blk-snap/big_buffer.h         |  24 +
>   drivers/block/blk-snap/blk-snap-ctl.h       | 190 ++++
>   drivers/block/blk-snap/blk_deferred.c       | 566 +++++++++++
>   drivers/block/blk-snap/blk_deferred.h       |  67 ++
>   drivers/block/blk-snap/blk_descr_file.c     |  82 ++
>   drivers/block/blk-snap/blk_descr_file.h     |  26 +
>   drivers/block/blk-snap/blk_descr_mem.c      |  66 ++
>   drivers/block/blk-snap/blk_descr_mem.h      |  14 +
>   drivers/block/blk-snap/blk_descr_multidev.c |  86 ++
>   drivers/block/blk-snap/blk_descr_multidev.h |  25 +
>   drivers/block/blk-snap/blk_descr_pool.c     | 190 ++++
>   drivers/block/blk-snap/blk_descr_pool.h     |  38 +
>   drivers/block/blk-snap/blk_redirect.c       | 507 ++++++++++
>   drivers/block/blk-snap/blk_redirect.h       |  73 ++
>   drivers/block/blk-snap/blk_util.c           |  33 +
>   drivers/block/blk-snap/blk_util.h           |  33 +
>   drivers/block/blk-snap/cbt_map.c            | 210 +++++
>   drivers/block/blk-snap/cbt_map.h            |  62 ++
>   drivers/block/blk-snap/common.h             |  31 +
>   drivers/block/blk-snap/ctrl_fops.c          | 691 ++++++++++++++
>   drivers/block/blk-snap/ctrl_fops.h          |  19 +
>   drivers/block/blk-snap/ctrl_pipe.c          | 562 +++++++++++
>   drivers/block/blk-snap/ctrl_pipe.h          |  34 +
>   drivers/block/blk-snap/ctrl_sysfs.c         |  73 ++
>   drivers/block/blk-snap/ctrl_sysfs.h         |   5 +
>   drivers/block/blk-snap/defer_io.c           | 397 ++++++++
>   drivers/block/blk-snap/defer_io.h           |  39 +
>   drivers/block/blk-snap/main.c               |  82 ++
>   drivers/block/blk-snap/params.c             |  58 ++
>   drivers/block/blk-snap/params.h             |  29 +
>   drivers/block/blk-snap/rangevector.c        |  85 ++
>   drivers/block/blk-snap/rangevector.h        |  31 +
>   drivers/block/blk-snap/snapimage.c          | 982 ++++++++++++++++++++
>   drivers/block/blk-snap/snapimage.h          |  16 +
>   drivers/block/blk-snap/snapshot.c           | 225 +++++
>   drivers/block/blk-snap/snapshot.h           |  17 +
>   drivers/block/blk-snap/snapstore.c          | 929 ++++++++++++++++++
>   drivers/block/blk-snap/snapstore.h          |  68 ++
>   drivers/block/blk-snap/snapstore_device.c   | 532 +++++++++++
>   drivers/block/blk-snap/snapstore_device.h   |  63 ++
>   drivers/block/blk-snap/snapstore_file.c     |  52 ++
>   drivers/block/blk-snap/snapstore_file.h     |  15 +
>   drivers/block/blk-snap/snapstore_mem.c      |  91 ++
>   drivers/block/blk-snap/snapstore_mem.h      |  20 +
>   drivers/block/blk-snap/snapstore_multidev.c | 118 +++
>   drivers/block/blk-snap/snapstore_multidev.h |  22 +
>   drivers/block/blk-snap/tracker.c            | 449 +++++++++
>   drivers/block/blk-snap/tracker.h            |  38 +
>   drivers/block/blk-snap/tracking.c           | 270 ++++++
>   drivers/block/blk-snap/tracking.h           |  13 +
>   drivers/block/blk-snap/version.h            |   7 +
>   fs/block_dev.c                              |   6 +-
>   fs/direct-io.c                              |   2 +-
>   fs/iomap/direct-io.c                        |   2 +-
>   include/linux/bio.h                         |   4 +-
>   include/linux/blk-filter.h                  |  76 ++
>   include/linux/genhd.h                       |   8 +-
>   kernel/power/swap.c                         |   2 +-
>   mm/page_io.c                                |   4 +-
>   70 files changed, 9074 insertions(+), 26 deletions(-)
>   create mode 100644 block/blk-filter-internal.h
>   create mode 100644 block/blk-filter.c
>   create mode 100644 drivers/block/blk-snap/Kconfig
>   create mode 100644 drivers/block/blk-snap/Makefile
>   create mode 100644 drivers/block/blk-snap/big_buffer.c
>   create mode 100644 drivers/block/blk-snap/big_buffer.h
>   create mode 100644 drivers/block/blk-snap/blk-snap-ctl.h
>   create mode 100644 drivers/block/blk-snap/blk_deferred.c
>   create mode 100644 drivers/block/blk-snap/blk_deferred.h
>   create mode 100644 drivers/block/blk-snap/blk_descr_file.c
>   create mode 100644 drivers/block/blk-snap/blk_descr_file.h
>   create mode 100644 drivers/block/blk-snap/blk_descr_mem.c
>   create mode 100644 drivers/block/blk-snap/blk_descr_mem.h
>   create mode 100644 drivers/block/blk-snap/blk_descr_multidev.c
>   create mode 100644 drivers/block/blk-snap/blk_descr_multidev.h
>   create mode 100644 drivers/block/blk-snap/blk_descr_pool.c
>   create mode 100644 drivers/block/blk-snap/blk_descr_pool.h
>   create mode 100644 drivers/block/blk-snap/blk_redirect.c
>   create mode 100644 drivers/block/blk-snap/blk_redirect.h
>   create mode 100644 drivers/block/blk-snap/blk_util.c
>   create mode 100644 drivers/block/blk-snap/blk_util.h
>   create mode 100644 drivers/block/blk-snap/cbt_map.c
>   create mode 100644 drivers/block/blk-snap/cbt_map.h
>   create mode 100644 drivers/block/blk-snap/common.h
>   create mode 100644 drivers/block/blk-snap/ctrl_fops.c
>   create mode 100644 drivers/block/blk-snap/ctrl_fops.h
>   create mode 100644 drivers/block/blk-snap/ctrl_pipe.c
>   create mode 100644 drivers/block/blk-snap/ctrl_pipe.h
>   create mode 100644 drivers/block/blk-snap/ctrl_sysfs.c
>   create mode 100644 drivers/block/blk-snap/ctrl_sysfs.h
>   create mode 100644 drivers/block/blk-snap/defer_io.c
>   create mode 100644 drivers/block/blk-snap/defer_io.h
>   create mode 100644 drivers/block/blk-snap/main.c
>   create mode 100644 drivers/block/blk-snap/params.c
>   create mode 100644 drivers/block/blk-snap/params.h
>   create mode 100644 drivers/block/blk-snap/rangevector.c
>   create mode 100644 drivers/block/blk-snap/rangevector.h
>   create mode 100644 drivers/block/blk-snap/snapimage.c
>   create mode 100644 drivers/block/blk-snap/snapimage.h
>   create mode 100644 drivers/block/blk-snap/snapshot.c
>   create mode 100644 drivers/block/blk-snap/snapshot.h
>   create mode 100644 drivers/block/blk-snap/snapstore.c
>   create mode 100644 drivers/block/blk-snap/snapstore.h
>   create mode 100644 drivers/block/blk-snap/snapstore_device.c
>   create mode 100644 drivers/block/blk-snap/snapstore_device.h
>   create mode 100644 drivers/block/blk-snap/snapstore_file.c
>   create mode 100644 drivers/block/blk-snap/snapstore_file.h
>   create mode 100644 drivers/block/blk-snap/snapstore_mem.c
>   create mode 100644 drivers/block/blk-snap/snapstore_mem.h
>   create mode 100644 drivers/block/blk-snap/snapstore_multidev.c
>   create mode 100644 drivers/block/blk-snap/snapstore_multidev.h
>   create mode 100644 drivers/block/blk-snap/tracker.c
>   create mode 100644 drivers/block/blk-snap/tracker.h
>   create mode 100644 drivers/block/blk-snap/tracking.c
>   create mode 100644 drivers/block/blk-snap/tracking.h
>   create mode 100644 drivers/block/blk-snap/version.h
>   create mode 100644 include/linux/blk-filter.h
> 
> --
> 2.20.1
> 
I do understand where you are coming from, but then we already have a 
dm-snap which does exactly what you want to achieve.
Of course, that would require a reconfiguration of the storage stack on 
the machine, which is not always possible (or desired).

What I _could_ imagine would be a 'dm-intercept' thingie, which 
redirects the current submit_bio() function for any block device, and 
re-routes that to a linear device-mapper device pointing back to the 
original block device.

That way you could attach it to basically any block device, _and_ can 
use the existing device-mapper functionality to do fancy stuff once the 
submit_io() callback has been re-routed.

And it also would help in other scenarios, too; with such a 
functionality we could seamlessly clone devices without having to move 
the whole setup to device-mapper first.

Cheers,

Hannes
Sergei Shtepa Oct. 21, 2020, 2:10 p.m. UTC | #2
The 10/21/2020 16:31, Hannes Reinecke wrote:
> I do understand where you are coming from, but then we already have a 
> dm-snap which does exactly what you want to achieve.
> Of course, that would require a reconfiguration of the storage stack on 
> the machine, which is not always possible (or desired).

Yes, reconfiguring the storage stack on a machine is almost impossible.

> 
> What I _could_ imagine would be a 'dm-intercept' thingie, which 
> redirects the current submit_bio() function for any block device, and 
> re-routes that to a linear device-mapper device pointing back to the 
> original block device.
> 
> That way you could attach it to basically any block device, _and_ can 
> use the existing device-mapper functionality to do fancy stuff once the 
> submit_io() callback has been re-routed.
> 
> And it also would help in other scenarios, too; with such a 
> functionality we could seamlessly clone devices without having to move 
> the whole setup to device-mapper first.

Hm... 
Did I understand correctly that the filter itself can be left approximately
as it is, but the blk-snap module can be replaced with 'dm-intercept',
which would use the re-route mechanism from the dm?
I think I may be able to implement it, if you describe your idea in more
detail.
Hannes Reinecke Oct. 22, 2020, 5:58 a.m. UTC | #3
On 10/21/20 4:10 PM, Sergei Shtepa wrote:
> The 10/21/2020 16:31, Hannes Reinecke wrote:
>> I do understand where you are coming from, but then we already have a
>> dm-snap which does exactly what you want to achieve.
>> Of course, that would require a reconfiguration of the storage stack on
>> the machine, which is not always possible (or desired).
> 
> Yes, reconfiguring the storage stack on a machine is almost impossible.
> 
>>
>> What I _could_ imagine would be a 'dm-intercept' thingie, which
>> redirects the current submit_bio() function for any block device, and
>> re-routes that to a linear device-mapper device pointing back to the
>> original block device.
>>
>> That way you could attach it to basically any block device, _and_ can
>> use the existing device-mapper functionality to do fancy stuff once the
>> submit_io() callback has been re-routed.
>>
>> And it also would help in other scenarios, too; with such a
>> functionality we could seamlessly clone devices without having to move
>> the whole setup to device-mapper first.
> 
> Hm...
> Did I understand correctly that the filter itself can be left approximately
> as it is, but the blk-snap module can be replaced with 'dm-intercept',
> which would use the re-route mechanism from the dm?
> I think I may be able to implement it, if you describe your idea in more
> detail.
> 
> 
Actually, once we have an dm-intercept, why do you need the block-layer 
filter at all?
 From you initial description the block-layer filter was implemented 
such that blk-snap could work; but if we have dm-intercept (and with it 
the ability to use device-mapper functionality even for normal block 
devices) there wouldn't be any need for the block-layer filter, no?

Cheers,

Hannes
Sergei Shtepa Oct. 22, 2020, 9:44 a.m. UTC | #4
The 10/22/2020 08:58, Hannes Reinecke wrote:
> On 10/21/20 4:10 PM, Sergei Shtepa wrote:
> > The 10/21/2020 16:31, Hannes Reinecke wrote:
> >> I do understand where you are coming from, but then we already have a
> >> dm-snap which does exactly what you want to achieve.
> >> Of course, that would require a reconfiguration of the storage stack on
> >> the machine, which is not always possible (or desired).
> > 
> > Yes, reconfiguring the storage stack on a machine is almost impossible.
> > 
> >>
> >> What I _could_ imagine would be a 'dm-intercept' thingie, which
> >> redirects the current submit_bio() function for any block device, and
> >> re-routes that to a linear device-mapper device pointing back to the
> >> original block device.
> >>
> >> That way you could attach it to basically any block device, _and_ can
> >> use the existing device-mapper functionality to do fancy stuff once the
> >> submit_io() callback has been re-routed.
> >>
> >> And it also would help in other scenarios, too; with such a
> >> functionality we could seamlessly clone devices without having to move
> >> the whole setup to device-mapper first.
> > 
> > Hm...
> > Did I understand correctly that the filter itself can be left approximately
> > as it is, but the blk-snap module can be replaced with 'dm-intercept',
> > which would use the re-route mechanism from the dm?
> > I think I may be able to implement it, if you describe your idea in more
> > detail.
> > 
> > 
> Actually, once we have an dm-intercept, why do you need the block-layer 
> filter at all?
>  From you initial description the block-layer filter was implemented 
> such that blk-snap could work; but if we have dm-intercept (and with it 
> the ability to use device-mapper functionality even for normal block 
> devices) there wouldn't be any need for the block-layer filter, no?

Maybe, but the problem is that I can't imagine how to implement
dm-intercept yet. 
How to use dm to implement interception without changing the stack
of block devices. We'll have to make a hook somewhere, isn`t it?

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Damien Le Moal Oct. 22, 2020, 10:28 a.m. UTC | #5
On 2020/10/22 18:43, Sergei Shtepa wrote:
> The 10/22/2020 08:58, Hannes Reinecke wrote:
>> On 10/21/20 4:10 PM, Sergei Shtepa wrote:
>>> The 10/21/2020 16:31, Hannes Reinecke wrote:
>>>> I do understand where you are coming from, but then we already have a
>>>> dm-snap which does exactly what you want to achieve.
>>>> Of course, that would require a reconfiguration of the storage stack on
>>>> the machine, which is not always possible (or desired).
>>>
>>> Yes, reconfiguring the storage stack on a machine is almost impossible.
>>>
>>>>
>>>> What I _could_ imagine would be a 'dm-intercept' thingie, which
>>>> redirects the current submit_bio() function for any block device, and
>>>> re-routes that to a linear device-mapper device pointing back to the
>>>> original block device.
>>>>
>>>> That way you could attach it to basically any block device, _and_ can
>>>> use the existing device-mapper functionality to do fancy stuff once the
>>>> submit_io() callback has been re-routed.
>>>>
>>>> And it also would help in other scenarios, too; with such a
>>>> functionality we could seamlessly clone devices without having to move
>>>> the whole setup to device-mapper first.
>>>
>>> Hm...
>>> Did I understand correctly that the filter itself can be left approximately
>>> as it is, but the blk-snap module can be replaced with 'dm-intercept',
>>> which would use the re-route mechanism from the dm?
>>> I think I may be able to implement it, if you describe your idea in more
>>> detail.
>>>
>>>
>> Actually, once we have an dm-intercept, why do you need the block-layer 
>> filter at all?
>>  From you initial description the block-layer filter was implemented 
>> such that blk-snap could work; but if we have dm-intercept (and with it 
>> the ability to use device-mapper functionality even for normal block 
>> devices) there wouldn't be any need for the block-layer filter, no?
> 
> Maybe, but the problem is that I can't imagine how to implement
> dm-intercept yet. 
> How to use dm to implement interception without changing the stack
> of block devices. We'll have to make a hook somewhere, isn`t it?

Once your dm-intercept target driver is inserted with "dmsetup" or any user land
tool you implement using libdevicemapper, the "hooks" will naturally be in place
since the dm infrastructure already does that: all submitted BIOs will be passed
to dm-intercept through the "map" operation defined in the target_type
descriptor. It is then that driver job to execute the BIOs as it sees fit.

Look at simple device mappers like dm-linear or dm-flakey for hints of how
things work (driver/md/dm-linear.c). More complex dm drivers like dm-crypt,
dm-writecache or dm-thin can give you hints about more features of device mapper.
Functions such as __map_bio() in drivers/md/dm.c are the core of DM and show
what happens to BIOs depending on the the return value of the map operation.
dm_submit_bio() and __split_and_process_bio() is the entry points for BIO
processing in DM.

> 
>>
>> Cheers,
>>
>> Hannes
>> -- 
>> Dr. Hannes Reinecke                Kernel Storage Architect
>> hare@suse.de                              +49 911 74053 688
>> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
>> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
>
Sergei Shtepa Oct. 22, 2020, 1:52 p.m. UTC | #6
The 10/22/2020 13:28, Damien Le Moal wrote:
> On 2020/10/22 18:43, Sergei Shtepa wrote:
> > 
> > Maybe, but the problem is that I can't imagine how to implement
> > dm-intercept yet. 
> > How to use dm to implement interception without changing the stack
> > of block devices. We'll have to make a hook somewhere, isn`t it?
> 
> Once your dm-intercept target driver is inserted with "dmsetup" or any user land
> tool you implement using libdevicemapper, the "hooks" will naturally be in place
> since the dm infrastructure already does that: all submitted BIOs will be passed
> to dm-intercept through the "map" operation defined in the target_type
> descriptor. It is then that driver job to execute the BIOs as it sees fit.
> 
> Look at simple device mappers like dm-linear or dm-flakey for hints of how
> things work (driver/md/dm-linear.c). More complex dm drivers like dm-crypt,
> dm-writecache or dm-thin can give you hints about more features of device mapper.
> Functions such as __map_bio() in drivers/md/dm.c are the core of DM and show
> what happens to BIOs depending on the the return value of the map operation.
> dm_submit_bio() and __split_and_process_bio() is the entry points for BIO
> processing in DM.
> 

Is there something I don't understand? Please correct me.

Let me remind that by the condition of the problem, we can't change
the configuration of the block device stack.

Let's imagine this configuration: /root mount point on ext filesystem
on /dev/sda1.
+---------------+
|               |
|  /root        |
|               |
+---------------+
|               |
| EXT FS        |
|               |
+---------------+
|               |
| block layer   |
|               |
| sda queue     |
|               |
+---------------+
|               |
| scsi driver   |
|               |
+---------------+

We need to add change block tracking (CBT) and snapshot functionality for
incremental backup.

With the DM we need to change the block device stack. Add device /dev/sda1
to LVM Volume group, create logical volume, change /etc/fstab and reboot.

The new scheme will look like this:
+---------------+
|               |
|  /root        |
|               |
+---------------+
|               |
| EXT FS        |
|               |
+---------------+
|               |
| LV-root       |
|               |
+------------------+
|                  |
| dm-cbt & dm-snap |
|                  |
+------------------+
|               |
| sda queue     |
|               |
+---------------+
|               |
| scsi driver   |
|               |
+---------------+

But I cannot change block device stack. And so I propose a scheme with
interception.
+---------------+
|               |
|  /root        |
|               |
+---------------+
|               |
| EXT FS        |
|               |
+---------------+   +-----------------+
|  |            |   |                 |
|  | blk-filter |-> | cbt & snapshot  |
|  |            |<- |                 |
|  +------------+   +-----------------+
|               |
| sda blk queue |
|               |
+---------------+
|               |
| scsi driver   |
|               |
+---------------+

Perhaps I can make "cbt & snapshot" inside the DM, but without interception
in any case, it will not work. Isn't that right?
Darrick J. Wong Oct. 22, 2020, 3:14 p.m. UTC | #7
On Thu, Oct 22, 2020 at 04:52:13PM +0300, Sergei Shtepa wrote:
> The 10/22/2020 13:28, Damien Le Moal wrote:
> > On 2020/10/22 18:43, Sergei Shtepa wrote:
> > > 
> > > Maybe, but the problem is that I can't imagine how to implement
> > > dm-intercept yet. 
> > > How to use dm to implement interception without changing the stack
> > > of block devices. We'll have to make a hook somewhere, isn`t it?
> > 
> > Once your dm-intercept target driver is inserted with "dmsetup" or any user land
> > tool you implement using libdevicemapper, the "hooks" will naturally be in place
> > since the dm infrastructure already does that: all submitted BIOs will be passed
> > to dm-intercept through the "map" operation defined in the target_type
> > descriptor. It is then that driver job to execute the BIOs as it sees fit.
> > 
> > Look at simple device mappers like dm-linear or dm-flakey for hints of how
> > things work (driver/md/dm-linear.c). More complex dm drivers like dm-crypt,
> > dm-writecache or dm-thin can give you hints about more features of device mapper.
> > Functions such as __map_bio() in drivers/md/dm.c are the core of DM and show
> > what happens to BIOs depending on the the return value of the map operation.
> > dm_submit_bio() and __split_and_process_bio() is the entry points for BIO
> > processing in DM.
> > 
> 
> Is there something I don't understand? Please correct me.
> 
> Let me remind that by the condition of the problem, we can't change
> the configuration of the block device stack.
> 
> Let's imagine this configuration: /root mount point on ext filesystem
> on /dev/sda1.
> +---------------+
> |               |
> |  /root        |
> |               |
> +---------------+
> |               |
> | EXT FS        |
> |               |
> +---------------+
> |               |
> | block layer   |
> |               |
> | sda queue     |
> |               |
> +---------------+
> |               |
> | scsi driver   |
> |               |
> +---------------+
> 
> We need to add change block tracking (CBT) and snapshot functionality for
> incremental backup.
> 
> With the DM we need to change the block device stack. Add device /dev/sda1
> to LVM Volume group, create logical volume, change /etc/fstab and reboot.
> 
> The new scheme will look like this:
> +---------------+
> |               |
> |  /root        |
> |               |
> +---------------+
> |               |
> | EXT FS        |
> |               |
> +---------------+
> |               |
> | LV-root       |
> |               |
> +------------------+
> |                  |
> | dm-cbt & dm-snap |
> |                  |
> +------------------+
> |               |
> | sda queue     |
> |               |
> +---------------+
> |               |
> | scsi driver   |
> |               |
> +---------------+
> 
> But I cannot change block device stack. And so I propose a scheme with
> interception.
> +---------------+
> |               |
> |  /root        |
> |               |
> +---------------+
> |               |
> | EXT FS        |
> |               |
> +---------------+   +-----------------+
> |  |            |   |                 |
> |  | blk-filter |-> | cbt & snapshot  |
> |  |            |<- |                 |
> |  +------------+   +-----------------+
> |               |
> | sda blk queue |
> |               |
> +---------------+
> |               |
> | scsi driver   |
> |               |
> +---------------+
> 
> Perhaps I can make "cbt & snapshot" inside the DM, but without interception
> in any case, it will not work. Isn't that right?

Stupid question: Why don't you change the block layer to make it
possible to insert device mapper devices after the blockdev has been set
up?

--D

> 
> -- 
> Sergei Shtepa
> Veeam Software developer.
Mike Snitzer Oct. 22, 2020, 5:54 p.m. UTC | #8
On Thu, Oct 22, 2020 at 11:14 AM Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>
> On Thu, Oct 22, 2020 at 04:52:13PM +0300, Sergei Shtepa wrote:
> > The 10/22/2020 13:28, Damien Le Moal wrote:
> > > On 2020/10/22 18:43, Sergei Shtepa wrote:
> > > >
> > > > Maybe, but the problem is that I can't imagine how to implement
> > > > dm-intercept yet.
> > > > How to use dm to implement interception without changing the stack
> > > > of block devices. We'll have to make a hook somewhere, isn`t it?
> > >
> > > Once your dm-intercept target driver is inserted with "dmsetup" or any user land
> > > tool you implement using libdevicemapper, the "hooks" will naturally be in place
> > > since the dm infrastructure already does that: all submitted BIOs will be passed
> > > to dm-intercept through the "map" operation defined in the target_type
> > > descriptor. It is then that driver job to execute the BIOs as it sees fit.
> > >
> > > Look at simple device mappers like dm-linear or dm-flakey for hints of how
> > > things work (driver/md/dm-linear.c). More complex dm drivers like dm-crypt,
> > > dm-writecache or dm-thin can give you hints about more features of device mapper.
> > > Functions such as __map_bio() in drivers/md/dm.c are the core of DM and show
> > > what happens to BIOs depending on the the return value of the map operation.
> > > dm_submit_bio() and __split_and_process_bio() is the entry points for BIO
> > > processing in DM.
> > >
> >
> > Is there something I don't understand? Please correct me.
> >
> > Let me remind that by the condition of the problem, we can't change
> > the configuration of the block device stack.
> >
> > Let's imagine this configuration: /root mount point on ext filesystem
> > on /dev/sda1.
> > +---------------+
> > |               |
> > |  /root        |
> > |               |
> > +---------------+
> > |               |
> > | EXT FS        |
> > |               |
> > +---------------+
> > |               |
> > | block layer   |
> > |               |
> > | sda queue     |
> > |               |
> > +---------------+
> > |               |
> > | scsi driver   |
> > |               |
> > +---------------+
> >
> > We need to add change block tracking (CBT) and snapshot functionality for
> > incremental backup.
> >
> > With the DM we need to change the block device stack. Add device /dev/sda1
> > to LVM Volume group, create logical volume, change /etc/fstab and reboot.
> >
> > The new scheme will look like this:
> > +---------------+
> > |               |
> > |  /root        |
> > |               |
> > +---------------+
> > |               |
> > | EXT FS        |
> > |               |
> > +---------------+
> > |               |
> > | LV-root       |
> > |               |
> > +------------------+
> > |                  |
> > | dm-cbt & dm-snap |
> > |                  |
> > +------------------+
> > |               |
> > | sda queue     |
> > |               |
> > +---------------+
> > |               |
> > | scsi driver   |
> > |               |
> > +---------------+
> >
> > But I cannot change block device stack. And so I propose a scheme with
> > interception.
> > +---------------+
> > |               |
> > |  /root        |
> > |               |
> > +---------------+
> > |               |
> > | EXT FS        |
> > |               |
> > +---------------+   +-----------------+
> > |  |            |   |                 |
> > |  | blk-filter |-> | cbt & snapshot  |
> > |  |            |<- |                 |
> > |  +------------+   +-----------------+
> > |               |
> > | sda blk queue |
> > |               |
> > +---------------+
> > |               |
> > | scsi driver   |
> > |               |
> > +---------------+
> >
> > Perhaps I can make "cbt & snapshot" inside the DM, but without interception
> > in any case, it will not work. Isn't that right?
>
> Stupid question: Why don't you change the block layer to make it
> possible to insert device mapper devices after the blockdev has been set
> up?

Not a stupid question.  Definitely something that us DM developers
have wanted to do for a while.  Devil is in the details but it is the
right way forward.

Otherwise, this intercept is really just a DM-lite remapping layer
without any of DM's well established capabilities.  Encouragingly, all
of the replies have effectively echoed this point.  (amusingly, seems
every mailing list under the sun is on the cc except dm-devel... now
rectified)

Alasdair has some concrete ideas on this line of work; I'm trying to
encourage him to reply ;)

Mike
Mike Snitzer Oct. 22, 2020, 6:35 p.m. UTC | #9
On Wed, Oct 21, 2020 at 5:04 AM Sergei Shtepa <sergei.shtepa@veeam.com> wrote:
>
> Hello everyone! Requesting for your comments and suggestions.
>
> # blk-filter
>
> Block layer filter allows to intercept BIO requests to a block device.
>
> Interception is performed at the very beginning of the BIO request
> processing, and therefore does not affect the operation of the request
> processing queue. This also makes it possible to intercept requests from
> a specific block device, rather than from the entire disk.
>
> The logic of the submit_bio function has been changed - since the
> function execution results are not processed anywhere (except for swap
> and direct-io) the function won't return a value anymore.

Your desire to switch to a void return comes exactly when I've noticed
we need it.

->submit_bio's blk_qc_t return is the cookie assigned by blk-mq.  Up
to this point we haven't actually used it for bio-based devices but it
seems clear we'll soon need for bio-based IO polling support.

Just today, I've been auditing drivers/md/dm.c with an eye toward
properly handling the blk_qc_t return (or lack thereof) from various
DM methods.

It could easily be that __submit_bio_noacct and __submit_bio_noacct_mq
will be updated to do something meaningful with the returned cookie
(or that DM will) to facilitate proper IO polling.

Mike
Christoph Hellwig Oct. 23, 2020, 9:13 a.m. UTC | #10
On Thu, Oct 22, 2020 at 01:54:16PM -0400, Mike Snitzer wrote:
> On Thu, Oct 22, 2020 at 11:14 AM Darrick J. Wong
> > Stupid question: Why don't you change the block layer to make it
> > possible to insert device mapper devices after the blockdev has been set
> > up?
> 
> Not a stupid question.  Definitely something that us DM developers
> have wanted to do for a while.  Devil is in the details but it is the
> right way forward.
> 

Yes, I think that is the right thing to do.  And I don't think it should
be all that hard.  All we'd need in the I/O path is something like the
pseudo-patch below, which will allow the interposer driver to resubmit
bios using submit_bio_noacct as long as the driver sets BIO_INTERPOSED.

diff --git a/block/blk-core.c b/block/blk-core.c
index ac00d2fa4eb48d..3f6f1eb565e0a8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1051,6 +1051,9 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
 		return BLK_QC_T_NONE;
 	}
 
+	if (blk_has_interposer(bio->bi_disk) &&
+	    !(bio->bi_flags & BIO_INTERPOSED))
+		return __submit_bio_interposed(bio);
 	if (!bio->bi_disk->fops->submit_bio)
 		return __submit_bio_noacct_mq(bio);
 	return __submit_bio_noacct(bio);
Hannes Reinecke Oct. 23, 2020, 10:31 a.m. UTC | #11
On 10/23/20 11:13 AM, hch@infradead.org wrote:
> On Thu, Oct 22, 2020 at 01:54:16PM -0400, Mike Snitzer wrote:
>> On Thu, Oct 22, 2020 at 11:14 AM Darrick J. Wong
>>> Stupid question: Why don't you change the block layer to make it
>>> possible to insert device mapper devices after the blockdev has been set
>>> up?
>>
>> Not a stupid question.  Definitely something that us DM developers
>> have wanted to do for a while.  Devil is in the details but it is the
>> right way forward.
>>
> 
> Yes, I think that is the right thing to do.  And I don't think it should
> be all that hard.  All we'd need in the I/O path is something like the
> pseudo-patch below, which will allow the interposer driver to resubmit
> bios using submit_bio_noacct as long as the driver sets BIO_INTERPOSED.
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ac00d2fa4eb48d..3f6f1eb565e0a8 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1051,6 +1051,9 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
>   		return BLK_QC_T_NONE;
>   	}
>   
> +	if (blk_has_interposer(bio->bi_disk) &&
> +	    !(bio->bi_flags & BIO_INTERPOSED))
> +		return __submit_bio_interposed(bio);
>   	if (!bio->bi_disk->fops->submit_bio)
>   		return __submit_bio_noacct_mq(bio);
>   	return __submit_bio_noacct(bio);
> 
My thoughts went more into the direction of hooking into ->submit_bio, 
seeing that it's a NULL pointer for most (all?) block drivers.

But sure, I'll check how the interposer approach would turn out.

Cheers,

Hannes
Sergei Shtepa Oct. 23, 2020, 11:04 a.m. UTC | #12
The 10/23/2020 13:31, Hannes Reinecke wrote:
> On 10/23/20 11:13 AM, hch@infradead.org wrote:
> > On Thu, Oct 22, 2020 at 01:54:16PM -0400, Mike Snitzer wrote:
> >> On Thu, Oct 22, 2020 at 11:14 AM Darrick J. Wong
> >>> Stupid question: Why don't you change the block layer to make it
> >>> possible to insert device mapper devices after the blockdev has been set
> >>> up?
> >>
> >> Not a stupid question.  Definitely something that us DM developers
> >> have wanted to do for a while.  Devil is in the details but it is the
> >> right way forward.
> >>
> > 
> > Yes, I think that is the right thing to do.  And I don't think it should
> > be all that hard.  All we'd need in the I/O path is something like the
> > pseudo-patch below, which will allow the interposer driver to resubmit
> > bios using submit_bio_noacct as long as the driver sets BIO_INTERPOSED.
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index ac00d2fa4eb48d..3f6f1eb565e0a8 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1051,6 +1051,9 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
> >   		return BLK_QC_T_NONE;
> >   	}
> >   
> > +	if (blk_has_interposer(bio->bi_disk) &&
> > +	    !(bio->bi_flags & BIO_INTERPOSED))
> > +		return __submit_bio_interposed(bio);
> >   	if (!bio->bi_disk->fops->submit_bio)
> >   		return __submit_bio_noacct_mq(bio);
> >   	return __submit_bio_noacct(bio);
> > 

It`s will be great! Approximately this interception capability is not
enough now.

> My thoughts went more into the direction of hooking into ->submit_bio, 
> seeing that it's a NULL pointer for most (all?) block drivers.
> 
> But sure, I'll check how the interposer approach would turn out.

If anyone will do the patch blk-interposer, please add me to CC.
I will try to offer my module that will use blk-interposer.
Christoph Hellwig Oct. 23, 2020, 11:12 a.m. UTC | #13
On Fri, Oct 23, 2020 at 12:31:05PM +0200, Hannes Reinecke wrote:
> My thoughts went more into the direction of hooking into ->submit_bio,
> seeing that it's a NULL pointer for most (all?) block drivers.
> 
> But sure, I'll check how the interposer approach would turn out.

submit_bio is owned by the underlying device, and for a good reason
stored in a const struct..