diff mbox series

[v3,03/11] documentation: Block Devices Snapshots Module

Message ID 20230404140835.25166-4-sergei.shtepa@veeam.com (mailing list archive)
State New, archived
Headers show
Series blksnap - block devices snapshots module | expand

Commit Message

Sergei Shtepa April 4, 2023, 2:08 p.m. UTC
The document contains:
* Describes the purpose of the mechanism
* Description of features
* Description of algorithms
* Recommendations about using the module from the user-space side
* Reference to module interface description

Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 Documentation/block/blksnap.rst | 345 ++++++++++++++++++++++++++++++++
 Documentation/block/index.rst   |   1 +
 MAINTAINERS                     |   6 +
 3 files changed, 352 insertions(+)
 create mode 100644 Documentation/block/blksnap.rst

Comments

Bagas Sanjaya April 10, 2023, 5:01 a.m. UTC | #1
On Tue, Apr 04, 2023 at 04:08:27PM +0200, Sergei Shtepa wrote:
> +The main properties that a backup tool should have are:
> +
> +- Simplicity and versatility of use
> +- Reliability
> +- Minimal consumption of system resources during backup
> +- Minimal time required for recovery or replication of the entire system
> +
> +Therefore, the features of the blksnap module are:
"Taking above properties into account, blksnap module features:"

> +The change tracker allows to determine which blocks were changed during the
> +time between the last snapshot created and any of the previous snapshots.
> +Having a map of changes, it is enough to copy only the changed blocks, and
"With a map of changes, ..."

> +3. ``blkfilter_ctl_blksnap_cbtdirty`` mark blocks as changed in the change
                                         marks

> +The blksnap [#userspace_tools]_ console tool allows to control the module
> +from the command line. The tool contains detailed built-in help. To get
> +the list of commands, enter the ``blksnap --help`` command. The ``blksnap
> +<command name> --help`` command allows to get detailed information about the
"To get list of commands with usage description, see ``blksnap --help``."

Thanks.
Donald Buczek April 12, 2023, 7:38 p.m. UTC | #2
I think, you can trigger all kind of user-after-free when userspace deletes a snapshot image or the snapshot image and the tracker while the disk device snapshot image is kept alive (mounted or just opened) and doing I/O.

Here is what I did to provoke that:

root@dose:~# s=$(blksnap snapshot_create -d /dev/vdb)
root@dose:~# blksnap snapshot_appendstorage -i $s -f /scratch/local/test.dat
device path: '/dev/block/253:2'
allocate range: ofs=11264624 cnt=2097152
root@dose:~# blksnap snapshot_take -i $s
root@dose:~# mount /dev/blksnap-image_253\:16 /mnt
root@dose:~# dd if=/dev/zero of=/mnt/x.x &
[1] 2514
root@dose:~# blksnap snapshot_destroy -i $s
dd: writing to '/mnt/x.x': No space left on device
1996041+0 records in
1996040+0 records out
1021972480 bytes (1.0 GB, 975 MiB) copied, 8.48923 s, 120 MB/s
[1]+  Exit 1                  dd if=/dev/zero of=/mnt/x.x

And here's the UAF:

[ 4508.526091] [2475] diff_storage_event_low:64: blksnap-diff-storage: Diff storage low free space. Portion: 2097152 sectors, requested: 2097152
[ 4508.526141] [2475] event_gen:44: blksnap-event_queue: Generate event: time=4507748140846 code=0 data_size=8
[ 4508.526158] blksnap-snapshot: Snapshot aa986b45-bf07-46f7-a52d-8d7829221f24 was created
[ 4512.731380] [2478] ioctl_snapshot_append_storage:195: blksnap: Append difference storage
[ 4512.731417] [2478] diff_storage_append_block:223: blksnap-diff-storage: Append 1 blocks
[ 4512.731485] [2478] diff_storage_add_range:193: blksnap-diff-storage: Add range to diff storage: [253:2] 11264624:2097152
[ 4512.780757] [2481] diff_area_new:180: blksnap-diff-area: Open device [253:16]
[ 4512.780786] [2481] diff_area_calculate_chunk_size:57: blksnap-diff-area: Minimal IO block 1 sectors
[ 4512.780794] [2481] diff_area_calculate_chunk_size:58: blksnap-diff-area: Device capacity 2097152 sectors
[ 4512.780801] [2481] diff_area_calculate_chunk_size:61: blksnap-diff-area: Chunks count 4096
[ 4512.780808] [2481] diff_area_calculate_chunk_size:76: blksnap-diff-area: The optimal chunk size was calculated as 262144 bytes for device [253:16]
[ 4512.780817] [2481] diff_area_new:200: blksnap-diff-area: Chunk size 262144 in bytes
[ 4512.780824] [2481] diff_area_new:201: blksnap-diff-area: Chunk count 4096
[ 4512.828814] [2481] snapshot_take_trackers:250: blksnap-snapshot: Device [253:16] was frozen
[ 4512.834029] [2481] cbt_map_switch:142: blksnap-cbt_map: CBT map switch
[ 4512.834051] [2481] snapshot_take_trackers:277: blksnap-snapshot: Device [253:16] was unfrozen
[ 4512.834058] blksnap-image: Create snapshot image device for original device [253:16]
[ 4512.835437] [2481] snapimage_create:97: blksnap-image: Snapshot image disk name [blksnap-image_253:16]
[ 4512.838499] [2481] snapimage_create:112: blksnap-image: Image block device [259:1] has been created
[ 4512.838508] blksnap-snapshot: Snapshot aa986b45-bf07-46f7-a52d-8d7829221f24 was taken successfully
[ 4525.592286] XFS (blksnap-image_253:16): Mounting V5 Filesystem 35f0c7a2-27fa-4183-8ede-7462ac31a97d
[ 4525.619281] XFS (blksnap-image_253:16): Ending clean mount
[ 4558.292030] clocksource: timekeeping watchdog on CPU10: hpet retried 2 times before success
[ 4558.486074] blksnap-snapshot: Destroy snapshot aa986b45-bf07-46f7-a52d-8d7829221f24
[ 4558.488731] blksnap-snapshot: Release snapshot aa986b45-bf07-46f7-a52d-8d7829221f24
[ 4558.505931] [2527] tracker_release_snapshot:293: blksnap-tracker: Tracker for device [253:16] release snapshot
[ 4558.505959] [2527] snapimage_free:63: blksnap-image: Snapshot image disk blksnap-image_253:16 delete
[ 4558.548358] [1899] diff_storage_event_low:64: blksnap-diff-storage: Diff storage low free space. Portion: 2097152 sectors, requested: 4194304
[ 4558.548378] [1899] event_gen:44: blksnap-event_queue: Generate event: time=4557770519985 code=0 data_size=8
[ 4561.444548] ==================================================================
[ 4561.446224] BUG: KASAN: slab-use-after-free in chunk_notify_store+0x40/0x190 [blksnap]
[ 4561.448018] Read of size 8 at addr ffff888112d21500 by task kworker/13:0/1504

[ 4561.449965] CPU: 13 PID: 1504 Comm: kworker/13:0 Not tainted 6.3.0-rc5.mx64.428-00094-g21dc08a94f59 #40
[ 4561.452025] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[ 4561.454538] Workqueue: events chunk_notify_store [blksnap]
[ 4561.455809] Call Trace:
[ 4561.456393]  <TASK>
[ 4561.456915]  dump_stack_lvl+0x37/0x50
[ 4561.457807]  print_report+0xcc/0x630
[ 4561.458650]  ? __virt_addr_valid+0xf5/0x180
[ 4561.459607]  ? chunk_notify_store+0x40/0x190 [blksnap]
[ 4561.460785]  kasan_report+0xb2/0xe0
[ 4561.468472]  ? chunk_notify_store+0x40/0x190 [blksnap]
[ 4561.476365]  chunk_notify_store+0x40/0x190 [blksnap]
[ 4561.483755]  process_one_work+0x407/0x790
[ 4561.490118]  worker_thread+0x2ab/0x700
[ 4561.495707]  ? __pfx_set_cpus_allowed_ptr+0x10/0x10
[ 4561.500883]  ? __pfx_worker_thread+0x10/0x10
[ 4561.505664]  kthread+0x15d/0x190
[ 4561.509864]  ? __pfx_kthread+0x10/0x10
[ 4561.513824]  ret_from_fork+0x2c/0x50
[ 4561.517546]  </TASK>

[ 4561.524020] Allocated by task 2481:
[ 4561.527147]  kasan_save_stack+0x22/0x50
[ 4561.530226]  kasan_set_track+0x25/0x30
[ 4561.533096]  __kasan_kmalloc+0x80/0x90
[ 4561.535868]  chunk_alloc+0x37/0xf0 [blksnap]
[ 4561.538569]  diff_area_new+0x42d/0x690 [blksnap]
[ 4561.541251]  snapshot_take+0x13b/0x530 [blksnap]
[ 4561.543853]  ioctl_snapshot_take+0x7a/0xc0 [blksnap]
[ 4561.546394]  ctrl_unlocked_ioctl+0x3a/0x60 [blksnap]
[ 4561.548898]  __x64_sys_ioctl+0xc6/0xe0
[ 4561.551276]  do_syscall_64+0x47/0xa0
[ 4561.553621]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

[ 4561.558154] Freed by task 2527:
[ 4561.560350]  kasan_save_stack+0x22/0x50
[ 4561.562569]  kasan_set_track+0x25/0x30
[ 4561.564788]  kasan_save_free_info+0x2b/0x50
[ 4561.567012]  ____kasan_slab_free+0xf9/0x1a0
[ 4561.569210]  __kmem_cache_free+0x141/0x200
[ 4561.571370]  diff_area_free+0xab/0x150 [blksnap]
[ 4561.573546]  tracker_release_snapshot+0xc8/0x110 [blksnap]
[ 4561.575763]  snapshot_free+0x9f/0x170 [blksnap]
[ 4561.577882]  snapshot_destroy+0x119/0x170 [blksnap]
[ 4561.579992]  ioctl_snapshot_destroy+0x7a/0xc0 [blksnap]
[ 4561.582155]  ctrl_unlocked_ioctl+0x3a/0x60 [blksnap]
[ 4561.584270]  __x64_sys_ioctl+0xc6/0xe0
[ 4561.586247]  do_syscall_64+0x47/0xa0
[ 4561.588176]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

[ 4561.591915] Last potentially related work creation:
[ 4561.593833]  kasan_save_stack+0x22/0x50
[ 4561.595611]  __kasan_record_aux_stack+0x60/0x70
[ 4561.597442]  kvfree_call_rcu+0x2e/0x460
[ 4561.599235]  cache_clean+0x46d/0x500 [sunrpc]
[ 4561.601181]  cache_flush+0x15/0x40 [sunrpc]
[ 4561.603096]  ip_map_parse+0x2ca/0x300 [sunrpc]
[ 4561.605035]  cache_do_downcall+0x59/0x90 [sunrpc]
[ 4561.606993]  cache_write_procfs+0x90/0xd0 [sunrpc]
[ 4561.608946]  proc_reg_write+0xe0/0x140
[ 4561.610728]  vfs_write+0x186/0x680
[ 4561.612465]  ksys_write+0xbd/0x160
[ 4561.614186]  do_syscall_64+0x47/0xa0
[ 4561.615922]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

[ 4561.619340] The buggy address belongs to the object at ffff888112d21500
                 which belongs to the cache kmalloc-96 of size 96
[ 4561.623265] The buggy address is located 0 bytes inside of
                 freed 96-byte region [ffff888112d21500, ffff888112d21560)

[ 4561.628960] The buggy address belongs to the physical page:
[ 4561.631000] page:00000000b32dc240 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x112d21
[ 4561.633393] flags: 0x17fffc000000200(slab|node=0|zone=2|lastcpupid=0x1ffff)
[ 4561.635646] raw: 017fffc000000200 ffff888100040300 ffffea0004317ed0 ffffea000406a350
[ 4561.637982] raw: 0000000000000000 ffff888112d21000 0000000100000020 0000000000000000
[ 4561.640338] page dumped because: kasan: bad access detected

[ 4561.644448] Memory state around the buggy address:
[ 4561.646610]  ffff888112d21400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 4561.649000]  ffff888112d21480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 4561.651367] >ffff888112d21500: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 4561.653709]                    ^
[ 4561.655754]  ffff888112d21580: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 4561.658132]  ffff888112d21600: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 4561.660503] ==================================================================
[ 4561.662931] Disabling lock debugging due to kernel taint
[ 4561.665449] general protection fault, probably for non-canonical address 0x79a00c8000009df: 0000 [#1] PREEMPT SMP KASAN PTI
[ 4561.668387] CPU: 13 PID: 1504 Comm: kworker/13:0 Tainted: G    B              6.3.0-rc5.mx64.428-00094-g21dc08a94f59 #40
[ 4561.671243] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[ 4561.674216] Workqueue: events chunk_notify_store [blksnap]
[ 4561.676778] RIP: 0010:chunk_notify_store+0x66/0x190 [blksnap]
[ 4561.679365] Code: d0 f5 90 e0 4c 8b 75 00 48 8d 7d 08 e8 c3 f5 90 e0 4c 8b 65 08 49 8d 7e 08 e8 66 f6 90 e0 4d 89 66 08 4c 89 e7 e8 5a f6 90 e0 <4d> 89 34 24 48 89 ef 4c 8d 73 68 e8 4a f6 90 e0 48 89 6d 00 48 89
[ 4561.685274] RSP: 0000:ffff88810e49fdd0 EFLAGS: 00010282
[ 4561.687972] RAX: 0000000000000000 RBX: ffff888181be3300 RCX: ffffffffa0b98f56
[ 4561.690821] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 079a00c8000009df
[ 4561.693676] RBP: ffff888112d21500 R08: 0000000000000001 R09: ffffffff84286a47
[ 4561.696550] R10: fffffbfff0850d48 R11: 0000000000000001 R12: 079a00c8000009df
[ 4561.699418] R13: ffff888181be3320 R14: ffff88817f708800 R15: ffff888181be3308
[ 4561.702307] FS:  0000000000000000(0000) GS:ffff888261c80000(0000) knlGS:0000000000000000
[ 4561.705292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4561.708105] CR2: 00007f7189264c58 CR3: 0000000111dec001 CR4: 0000000000170ee0
[ 4561.711052] Call Trace:
[ 4561.713647]  <TASK>
[ 4561.716196]  process_one_work+0x407/0x790
[ 4561.718912]  worker_thread+0x2ab/0x700
[ 4561.721597]  ? __pfx_set_cpus_allowed_ptr+0x10/0x10
[ 4561.724372]  ? __pfx_worker_thread+0x10/0x10
[ 4561.727101]  kthread+0x15d/0x190
[ 4561.729745]  ? __pfx_kthread+0x10/0x10
[ 4561.732418]  ret_from_fork+0x2c/0x50
[ 4561.735097]  </TASK>
[ 4561.737655] Modules linked in: blksnap rpcsec_gss_krb5 nfsv4 nfs 8021q garp stp mrp llc bochs kvm_intel drm_vram_helper drm_ttm_helper ttm kvm drm_kms_helper input_leds led_class drm virtio_net irqbypass syscopyarea net_failover sysfillrect intel_agp crc32c_intel failover floppy sysimgblt intel_gtt i2c_piix4 nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables unix ipv6 autofs4
[ 4561.748159] ---[ end trace 0000000000000000 ]---
[ 4561.751205] RIP: 0010:chunk_notify_store+0x66/0x190 [blksnap]
[ 4561.754393] Code: d0 f5 90 e0 4c 8b 75 00 48 8d 7d 08 e8 c3 f5 90 e0 4c 8b 65 08 49 8d 7e 08 e8 66 f6 90 e0 4d 89 66 08 4c 89 e7 e8 5a f6 90 e0 <4d> 89 34 24 48 89 ef 4c 8d 73 68 e8 4a f6 90 e0 48 89 6d 00 48 89
[ 4561.761441] RSP: 0000:ffff88810e49fdd0 EFLAGS: 00010282
[ 4561.764695] RAX: 0000000000000000 RBX: ffff888181be3300 RCX: ffffffffa0b98f56
[ 4561.768108] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 079a00c8000009df
[ 4561.771488] RBP: ffff888112d21500 R08: 0000000000000001 R09: ffffffff84286a47
[ 4561.774845] R10: fffffbfff0850d48 R11: 0000000000000001 R12: 079a00c8000009df
[ 4561.778140] R13: ffff888181be3320 R14: ffff88817f708800 R15: ffff888181be3308
[ 4561.781476] FS:  0000000000000000(0000) GS:ffff888261c80000(0000) knlGS:0000000000000000
[ 4561.784909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4561.788164] CR2: 00007f7189264c58 CR3: 0000000111dec001 CR4: 0000000000170ee0
[ 4586.218197] XFS (blksnap-image_253:16): log I/O error -5
[ 4586.218889] XFS (blksnap-image_253:16): metadata I/O error in "xfs_buf_ioend+0x3ea/0xb50" at daddr 0x1 len 1 error 5
[ 4586.225814] XFS (blksnap-image_253:16): Filesystem has been shut down due to log error (0x2).
[ 4586.246191] XFS (blksnap-image_253:16): Please unmount the filesystem and rectify the problem(s).


I was actually targeting this deref in snapimage.c:

     +static void snapimage_submit_bio(struct bio *bio)
     +{
     +	struct tracker *tracker = bio->bi_bdev->bd_disk->private_data;
     +	struct diff_area *diff_area = tracker->diff_area;

but didn't even get to delete the tracker...


Best

   Donald
Sergei Shtepa April 14, 2023, 12:34 p.m. UTC | #3
On 4/12/23 21:38, Donald Buczek wrote:
> Subject:
> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
> From:
> Donald Buczek <buczek@molgen.mpg.de>
> Date:
> 4/12/23, 21:38
> 
> To:
> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
> CC:
> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
> 
> 
> I think, you can trigger all kind of user-after-free when userspace deletes a snapshot image or the snapshot image and the tracker while the disk device snapshot image is kept alive (mounted or just opened) and doing I/O.
> 
> Here is what I did to provoke that:
> 
> root@dose:~# s=$(blksnap snapshot_create -d /dev/vdb)
> root@dose:~# blksnap snapshot_appendstorage -i $s -f /scratch/local/test.dat
> device path: '/dev/block/253:2'
> allocate range: ofs=11264624 cnt=2097152
> root@dose:~# blksnap snapshot_take -i $s
> root@dose:~# mount /dev/blksnap-image_253\:16 /mnt
> root@dose:~# dd if=/dev/zero of=/mnt/x.x &
> [1] 2514
> root@dose:~# blksnap snapshot_destroy -i $s
> dd: writing to '/mnt/x.x': No space left on device
> 1996041+0 records in
> 1996040+0 records out
> 1021972480 bytes (1.0 GB, 975 MiB) copied, 8.48923 s, 120 MB/s
> [1]+  Exit 1                  dd if=/dev/zero of=/mnt/x.x
> 

Thanks!
I am very glad that the blksnap tool turned out to be useful in the review.
This snapshot deletion scenario is not the most typical, but of course it is
quite possible.
I will need to solve this problem and add such a scenario to the test suite.
Sergei Shtepa April 18, 2023, 10:31 a.m. UTC | #4
On 4/14/23 14:34, Sergei Shtepa wrote:
> Subject:
> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
> From:
> Sergei Shtepa <sergei.shtepa@veeam.com>
> Date:
> 4/14/23, 14:34
> 
> To:
> Donald Buczek <buczek@molgen.mpg.de>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
> CC:
> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
> 
> 
> 
> On 4/12/23 21:38, Donald Buczek wrote:
>> Subject:
>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>> From:
>> Donald Buczek <buczek@molgen.mpg.de>
>> Date:
>> 4/12/23, 21:38
>>
>> To:
>> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>> CC:
>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>
>>
>> I think, you can trigger all kind of user-after-free when userspace deletes a snapshot image or the snapshot image and the tracker while the disk device snapshot image is kept alive (mounted or just opened) and doing I/O.
>>
>> Here is what I did to provoke that:
>>
>> root@dose:~# s=$(blksnap snapshot_create -d /dev/vdb)
>> root@dose:~# blksnap snapshot_appendstorage -i $s -f /scratch/local/test.dat
>> device path: '/dev/block/253:2'
>> allocate range: ofs=11264624 cnt=2097152
>> root@dose:~# blksnap snapshot_take -i $s
>> root@dose:~# mount /dev/blksnap-image_253\:16 /mnt
>> root@dose:~# dd if=/dev/zero of=/mnt/x.x &
>> [1] 2514
>> root@dose:~# blksnap snapshot_destroy -i $s
>> dd: writing to '/mnt/x.x': No space left on device
>> 1996041+0 records in
>> 1996040+0 records out
>> 1021972480 bytes (1.0 GB, 975 MiB) copied, 8.48923 s, 120 MB/s
>> [1]+  Exit 1                  dd if=/dev/zero of=/mnt/x.x
>>
> Thanks!
> I am very glad that the blksnap tool turned out to be useful in the review.
> This snapshot deletion scenario is not the most typical, but of course it is
> quite possible.
> I will need to solve this problem and add such a scenario to the test suite.
> 

Hi!

I have redesign the logic of ownership of the diff_area structure.
See patch in attach or commit.
Link: https://github.com/SergeiShtepa/linux/commit/7e927c381dcd2b2293be8315897a224d111b6f88
A test script for such a scenario has been added.
Link: https://github.com/veeam/blksnap/commit/fd0559dfedf094901d08bbf185fed288f0156433

I will be glad of any feedback.
Donald Buczek April 18, 2023, 2:48 p.m. UTC | #5
On 4/18/23 12:31, Sergei Shtepa wrote:
> 
> 
> On 4/14/23 14:34, Sergei Shtepa wrote:
>> Subject:
>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>> From:
>> Sergei Shtepa <sergei.shtepa@veeam.com>
>> Date:
>> 4/14/23, 14:34
>>
>> To:
>> Donald Buczek <buczek@molgen.mpg.de>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>> CC:
>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>
>>
>>
>> On 4/12/23 21:38, Donald Buczek wrote:
>>> Subject:
>>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>>> From:
>>> Donald Buczek <buczek@molgen.mpg.de>
>>> Date:
>>> 4/12/23, 21:38
>>>
>>> To:
>>> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>>> CC:
>>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>>
>>>
>>> I think, you can trigger all kind of user-after-free when userspace deletes a snapshot image or the snapshot image and the tracker while the disk device snapshot image is kept alive (mounted or just opened) and doing I/O.
>>>
>>> Here is what I did to provoke that:
>>>
>>> root@dose:~# s=$(blksnap snapshot_create -d /dev/vdb)
>>> root@dose:~# blksnap snapshot_appendstorage -i $s -f /scratch/local/test.dat
>>> device path: '/dev/block/253:2'
>>> allocate range: ofs=11264624 cnt=2097152
>>> root@dose:~# blksnap snapshot_take -i $s
>>> root@dose:~# mount /dev/blksnap-image_253\:16 /mnt
>>> root@dose:~# dd if=/dev/zero of=/mnt/x.x &
>>> [1] 2514
>>> root@dose:~# blksnap snapshot_destroy -i $s
>>> dd: writing to '/mnt/x.x': No space left on device
>>> 1996041+0 records in
>>> 1996040+0 records out
>>> 1021972480 bytes (1.0 GB, 975 MiB) copied, 8.48923 s, 120 MB/s
>>> [1]+  Exit 1                  dd if=/dev/zero of=/mnt/x.x
>>>
>> Thanks!
>> I am very glad that the blksnap tool turned out to be useful in the review.
>> This snapshot deletion scenario is not the most typical, but of course it is
>> quite possible.
>> I will need to solve this problem and add such a scenario to the test suite.
>>
> 
> Hi!
> 
> I have redesign the logic of ownership of the diff_area structure.
> See patch in attach or commit.
> Link: https://github.com/SergeiShtepa/linux/commit/7e927c381dcd2b2293be8315897a224d111b6f88
> A test script for such a scenario has been added.
> Link: https://github.com/veeam/blksnap/commit/fd0559dfedf094901d08bbf185fed288f0156433
> 
> I will be glad of any feedback.

Great, Thanks!

However, there are two leftover calls to diff_area_free() with its old prototype:

  CC [M]  drivers/block/blksnap/diff_area.o
drivers/block/blksnap/diff_area.c: In function ‘diff_area_new’:
drivers/block/blksnap/diff_area.c:283:18: error: passing argument 1 of ‘diff_area_free’ from incompatible pointer type [-Werror=incompatible-pointer-types]
   283 |   diff_area_free(diff_area);
       |                  ^~~~~~~~~
       |                  |
       |                  struct diff_area *
drivers/block/blksnap/diff_area.c:110:34: note: expected ‘struct kref *’ but argument is of type ‘struct diff_area *’
   110 | void diff_area_free(struct kref *kref)
       |                     ~~~~~~~~~~~~~^~~~
cc1: some warnings being treated as errors
make[4]: *** [scripts/Makefile.build:252: drivers/block/blksnap/diff_area.o] Error 1
make[3]: *** [scripts/Makefile.build:494: drivers/block/blksnap] Error 2
make[2]: *** [scripts/Makefile.build:494: drivers/block] Error 2
make[1]: *** [scripts/Makefile.build:494: drivers] Error 2
make: *** [Makefile:2025: .] Error 2

The other one:

buczek@dose:/scratch/local/linux (blksnap-test)$ make drivers/block/blksnap/tracker.o
   CALL    scripts/checksyscalls.sh
   DESCEND objtool
   INSTALL libsubcmd_headers
   CC [M]  drivers/block/blksnap/tracker.o
drivers/block/blksnap/tracker.c: In function ‘tracker_free’:
drivers/block/blksnap/tracker.c:26:25: error: passing argument 1 of ‘diff_area_free’ from incompatible pointer type [-Werror=incompatible-pointer-types]
    26 |   diff_area_free(tracker->diff_area);
       |                  ~~~~~~~^~~~~~~~~~~
       |                         |
       |                         struct diff_area *
In file included from drivers/block/blksnap/tracker.c:12:
drivers/block/blksnap/diff_area.h:116:34: note: expected ‘struct kref *’ but argument is of type ‘struct diff_area *’
   116 | void diff_area_free(struct kref *kref);
       |                     ~~~~~~~~~~~~~^~~~
cc1: some warnings being treated as errors
make[4]: *** [scripts/Makefile.build:252: drivers/block/blksnap/tracker.o] Error 1
make[3]: *** [scripts/Makefile.build:494: drivers/block/blksnap] Error 2
make[2]: *** [scripts/Makefile.build:494: drivers/block] Error 2
make[1]: *** [scripts/Makefile.build:494: drivers] Error 2
make: *** [Makefile:2025: .] Error 2

Am I missing something?

Best
   Donald
Sergei Shtepa April 19, 2023, 1:05 p.m. UTC | #6
On 4/18/23 16:48, Donald Buczek wrote:
> Subject:
> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
> From:
> Donald Buczek <buczek@molgen.mpg.de>
> Date:
> 4/18/23, 16:48
> 
> To:
> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
> CC:
> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
> 
> 
> On 4/18/23 12:31, Sergei Shtepa wrote:
>>
>>
>> On 4/14/23 14:34, Sergei Shtepa wrote:
>>> Subject:
>>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>>> From:
>>> Sergei Shtepa <sergei.shtepa@veeam.com>
>>> Date:
>>> 4/14/23, 14:34
>>>
>>> To:
>>> Donald Buczek <buczek@molgen.mpg.de>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>>> CC:
>>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>>
>>>
>>>
>>> On 4/12/23 21:38, Donald Buczek wrote:
>>>> Subject:
>>>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>>>> From:
>>>> Donald Buczek <buczek@molgen.mpg.de>
>>>> Date:
>>>> 4/12/23, 21:38
>>>>
>>>> To:
>>>> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>>>> CC:
>>>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>>>
>>>>
>>>> I think, you can trigger all kind of user-after-free when userspace deletes a snapshot image or the snapshot image and the tracker while the disk device snapshot image is kept alive (mounted or just opened) and doing I/O.
>>>>
>>>> Here is what I did to provoke that:
>>>>
>>>> root@dose:~# s=$(blksnap snapshot_create -d /dev/vdb)
>>>> root@dose:~# blksnap snapshot_appendstorage -i $s -f /scratch/local/test.dat
>>>> device path: '/dev/block/253:2'
>>>> allocate range: ofs=11264624 cnt=2097152
>>>> root@dose:~# blksnap snapshot_take -i $s
>>>> root@dose:~# mount /dev/blksnap-image_253\:16 /mnt
>>>> root@dose:~# dd if=/dev/zero of=/mnt/x.x &
>>>> [1] 2514
>>>> root@dose:~# blksnap snapshot_destroy -i $s
>>>> dd: writing to '/mnt/x.x': No space left on device
>>>> 1996041+0 records in
>>>> 1996040+0 records out
>>>> 1021972480 bytes (1.0 GB, 975 MiB) copied, 8.48923 s, 120 MB/s
>>>> [1]+  Exit 1                  dd if=/dev/zero of=/mnt/x.x
>>>>
>>> Thanks!
>>> I am very glad that the blksnap tool turned out to be useful in the review.
>>> This snapshot deletion scenario is not the most typical, but of course it is
>>> quite possible.
>>> I will need to solve this problem and add such a scenario to the test suite.
>>>
>>
>> Hi!
>>
>> I have redesign the logic of ownership of the diff_area structure.
>> See patch in attach or commit.
>> Link: https://github.com/SergeiShtepa/linux/commit/7e927c381dcd2b2293be8315897a224d111b6f88
>> A test script for such a scenario has been added.
>> Link: https://github.com/veeam/blksnap/commit/fd0559dfedf094901d08bbf185fed288f0156433
>>
>> I will be glad of any feedback.
> 
> Great, Thanks!
> 
> However, there are two leftover calls to diff_area_free() with its old prototype:
> 
>  CC [M]  drivers/block/blksnap/diff_area.o
> drivers/block/blksnap/diff_area.c: In function ‘diff_area_new’:
> drivers/block/blksnap/diff_area.c:283:18: error: passing argument 1 of ‘diff_area_free’ from incompatible pointer type [-Werror=incompatible-pointer-types]
>   283 |   diff_area_free(diff_area);
>       |                  ^~~~~~~~~
>       |                  |
>       |                  struct diff_area *
> drivers/block/blksnap/diff_area.c:110:34: note: expected ‘struct kref *’ but argument is of type ‘struct diff_area *’
>   110 | void diff_area_free(struct kref *kref)
>       |                     ~~~~~~~~~~~~~^~~~
> cc1: some warnings being treated as errors
> make[4]: *** [scripts/Makefile.build:252: drivers/block/blksnap/diff_area.o] Error 1
> make[3]: *** [scripts/Makefile.build:494: drivers/block/blksnap] Error 2
> make[2]: *** [scripts/Makefile.build:494: drivers/block] Error 2
> make[1]: *** [scripts/Makefile.build:494: drivers] Error 2
> make: *** [Makefile:2025: .] Error 2
> 
> The other one:
> 
> buczek@dose:/scratch/local/linux (blksnap-test)$ make drivers/block/blksnap/tracker.o
>   CALL    scripts/checksyscalls.sh
>   DESCEND objtool
>   INSTALL libsubcmd_headers
>   CC [M]  drivers/block/blksnap/tracker.o
> drivers/block/blksnap/tracker.c: In function ‘tracker_free’:
> drivers/block/blksnap/tracker.c:26:25: error: passing argument 1 of ‘diff_area_free’ from incompatible pointer type [-Werror=incompatible-pointer-types]
>    26 |   diff_area_free(tracker->diff_area);
>       |                  ~~~~~~~^~~~~~~~~~~
>       |                         |
>       |                         struct diff_area *
> In file included from drivers/block/blksnap/tracker.c:12:
> drivers/block/blksnap/diff_area.h:116:34: note: expected ‘struct kref *’ but argument is of type ‘struct diff_area *’
>   116 | void diff_area_free(struct kref *kref);
>       |                     ~~~~~~~~~~~~~^~~~
> cc1: some warnings being treated as errors
> make[4]: *** [scripts/Makefile.build:252: drivers/block/blksnap/tracker.o] Error 1
> make[3]: *** [scripts/Makefile.build:494: drivers/block/blksnap] Error 2
> make[2]: *** [scripts/Makefile.build:494: drivers/block] Error 2
> make[1]: *** [scripts/Makefile.build:494: drivers] Error 2
> make: *** [Makefile:2025: .] Error 2
> 
> Am I missing something?

Thanks!

It seems to me that I missed something.
The biggest mystery for me is why I was able to build and test the kernel.
I think it's some kind of incremental build effect.
I was only able to see the problem after 'make clean'.

Patches in attach and https://github.com/SergeiShtepa/linux/tree/blksnap-master
Donald Buczek April 19, 2023, 7:42 p.m. UTC | #7
Dear Sergei,

On 4/19/23 15:05, Sergei Shtepa wrote:
> 
> 
> On 4/18/23 16:48, Donald Buczek wrote:
>> Subject:
>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>> From:
>> Donald Buczek <buczek@molgen.mpg.de>
>> Date:
>> 4/18/23, 16:48
>>
>> To:
>> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>> CC:
>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>
>>
>> On 4/18/23 12:31, Sergei Shtepa wrote:
>>>
>>>
>>> On 4/14/23 14:34, Sergei Shtepa wrote:
>>>> Subject:
>>>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>>>> From:
>>>> Sergei Shtepa <sergei.shtepa@veeam.com>
>>>> Date:
>>>> 4/14/23, 14:34
>>>>
>>>> To:
>>>> Donald Buczek <buczek@molgen.mpg.de>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>>>> CC:
>>>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>>>
>>>>
>>>>
>>>> On 4/12/23 21:38, Donald Buczek wrote:
>>>>> Subject:
>>>>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>>>>> From:
>>>>> Donald Buczek <buczek@molgen.mpg.de>
>>>>> Date:
>>>>> 4/12/23, 21:38
>>>>>
>>>>> To:
>>>>> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>>>>> CC:
>>>>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>>>>
>>>>>
>>>>> I think, you can trigger all kind of user-after-free when userspace deletes a snapshot image or the snapshot image and the tracker while the disk device snapshot image is kept alive (mounted or just opened) and doing I/O.
>>>>>
>>>>> Here is what I did to provoke that:
>>>>>
>>>>> root@dose:~# s=$(blksnap snapshot_create -d /dev/vdb)
>>>>> root@dose:~# blksnap snapshot_appendstorage -i $s -f /scratch/local/test.dat
>>>>> device path: '/dev/block/253:2'
>>>>> allocate range: ofs=11264624 cnt=2097152
>>>>> root@dose:~# blksnap snapshot_take -i $s
>>>>> root@dose:~# mount /dev/blksnap-image_253\:16 /mnt
>>>>> root@dose:~# dd if=/dev/zero of=/mnt/x.x &
>>>>> [1] 2514
>>>>> root@dose:~# blksnap snapshot_destroy -i $s
>>>>> dd: writing to '/mnt/x.x': No space left on device
>>>>> 1996041+0 records in
>>>>> 1996040+0 records out
>>>>> 1021972480 bytes (1.0 GB, 975 MiB) copied, 8.48923 s, 120 MB/s
>>>>> [1]+  Exit 1                  dd if=/dev/zero of=/mnt/x.x
>>>>>
>>>> Thanks!
>>>> I am very glad that the blksnap tool turned out to be useful in the review.
>>>> This snapshot deletion scenario is not the most typical, but of course it is
>>>> quite possible.
>>>> I will need to solve this problem and add such a scenario to the test suite.
>>>>
>>>
>>> Hi!
>>>
>>> I have redesign the logic of ownership of the diff_area structure.
>>> See patch in attach or commit.
>>> Link: https://github.com/SergeiShtepa/linux/commit/7e927c381dcd2b2293be8315897a224d111b6f88
>>> A test script for such a scenario has been added.
>>> Link: https://github.com/veeam/blksnap/commit/fd0559dfedf094901d08bbf185fed288f0156433
>>>
>>> I will be glad of any feedback.
>>
>> Great, Thanks!
>>
>> However, there are two leftover calls to diff_area_free() with its old prototype:
>>
>>   CC [M]  drivers/block/blksnap/diff_area.o
>> drivers/block/blksnap/diff_area.c: In function ‘diff_area_new’:
>> drivers/block/blksnap/diff_area.c:283:18: error: passing argument 1 of ‘diff_area_free’ from incompatible pointer type [-Werror=incompatible-pointer-types]
>>    283 |   diff_area_free(diff_area);
>>        |                  ^~~~~~~~~
>>        |                  |
>>        |                  struct diff_area *
>> drivers/block/blksnap/diff_area.c:110:34: note: expected ‘struct kref *’ but argument is of type ‘struct diff_area *’
>>    110 | void diff_area_free(struct kref *kref)
>>        |                     ~~~~~~~~~~~~~^~~~
>> cc1: some warnings being treated as errors
>> make[4]: *** [scripts/Makefile.build:252: drivers/block/blksnap/diff_area.o] Error 1
>> make[3]: *** [scripts/Makefile.build:494: drivers/block/blksnap] Error 2
>> make[2]: *** [scripts/Makefile.build:494: drivers/block] Error 2
>> make[1]: *** [scripts/Makefile.build:494: drivers] Error 2
>> make: *** [Makefile:2025: .] Error 2
>>
>> The other one:
>>
>> buczek@dose:/scratch/local/linux (blksnap-test)$ make drivers/block/blksnap/tracker.o
>>    CALL    scripts/checksyscalls.sh
>>    DESCEND objtool
>>    INSTALL libsubcmd_headers
>>    CC [M]  drivers/block/blksnap/tracker.o
>> drivers/block/blksnap/tracker.c: In function ‘tracker_free’:
>> drivers/block/blksnap/tracker.c:26:25: error: passing argument 1 of ‘diff_area_free’ from incompatible pointer type [-Werror=incompatible-pointer-types]
>>     26 |   diff_area_free(tracker->diff_area);
>>        |                  ~~~~~~~^~~~~~~~~~~
>>        |                         |
>>        |                         struct diff_area *
>> In file included from drivers/block/blksnap/tracker.c:12:
>> drivers/block/blksnap/diff_area.h:116:34: note: expected ‘struct kref *’ but argument is of type ‘struct diff_area *’
>>    116 | void diff_area_free(struct kref *kref);
>>        |                     ~~~~~~~~~~~~~^~~~
>> cc1: some warnings being treated as errors
>> make[4]: *** [scripts/Makefile.build:252: drivers/block/blksnap/tracker.o] Error 1
>> make[3]: *** [scripts/Makefile.build:494: drivers/block/blksnap] Error 2
>> make[2]: *** [scripts/Makefile.build:494: drivers/block] Error 2
>> make[1]: *** [scripts/Makefile.build:494: drivers] Error 2
>> make: *** [Makefile:2025: .] Error 2
>>
>> Am I missing something?
> 
> Thanks!
> 
> It seems to me that I missed something.
> The biggest mystery for me is why I was able to build and test the kernel.
> I think it's some kind of incremental build effect.
> I was only able to see the problem after 'make clean'.
> 
> Patches in attach and https://github.com/SergeiShtepa/linux/tree/blksnap-master

Thanks. I can confirm that this fixes the reported problem and I no longer can trigger the UAF. :-)

Tested-Bny: Donald Buczek <buczek@molgen.mpg.de>

Maybe you can add me to the cc list for v4 as I'm not subscribed to the lists.

Best

   Donald
Donald Buczek April 20, 2023, 2:44 p.m. UTC | #8
On 4/19/23 21:42, Donald Buczek wrote:
> Dear Sergei,
> 
> On 4/19/23 15:05, Sergei Shtepa wrote:
>> [...]
>>
>> Patches in attach and https://github.com/SergeiShtepa/linux/tree/blksnap-master
> 
> Thanks. I can confirm that this fixes the reported problem and I no longer can trigger the UAF. :-)
> 
> Tested-Bny: Donald Buczek <buczek@molgen.mpg.de>
> 
> Maybe you can add me to the cc list for v4 as I'm not subscribed to the lists.


Sorry, found another one. Reproducer:

=====
#! /bin/bash
set -xe
modprobe blksnap
test -e /scratch/local/test.dat || fallocate -l 1G /scratch/local/test.dat
s=$(blksnap snapshot_create -d /dev/vdb)
blksnap snapshot_appendstorage -i $s -f /scratch/local/test.dat
blksnap snapshot_take -i $s
s2=$(blksnap snapshot_create -d /dev/vdb)
blksnap snapshot_destroy -i $s2
blksnap snapshot_destroy -i $s
=====


[20382.402921] blksnap-snapshot: Snapshot ff1c54f1-3e8c-4c99-bb26-35e82dc1c9fa was created
[20382.535933] blksnap-image: Create snapshot image device for original device [253:16]
[20382.542405] blksnap-snapshot: Snapshot ff1c54f1-3e8c-4c99-bb26-35e82dc1c9fa was taken successfully
[20382.572564] blksnap-snapshot: Snapshot 4b2d571d-9a24-419d-96c2-8d64a07c4966 was created
[20382.600521] blksnap-snapshot: Destroy snapshot 4b2d571d-9a24-419d-96c2-8d64a07c4966
[20382.602373] blksnap-snapshot: Release snapshot 4b2d571d-9a24-419d-96c2-8d64a07c4966
[20382.722137] blksnap-snapshot: Destroy snapshot ff1c54f1-3e8c-4c99-bb26-35e82dc1c9fa
[20382.724033] blksnap-snapshot: Release snapshot ff1c54f1-3e8c-4c99-bb26-35e82dc1c9fa
[20382.725850] ==================================================================
[20382.727641] BUG: KASAN: wild-memory-access in snapshot_free+0x73/0x170 [blksnap]
[20382.729326] Write of size 8 at addr dead000000000108 by task blksnap/8297

[20382.731212] CPU: 4 PID: 8297 Comm: blksnap Not tainted 6.3.0-rc5.mx64.428-00094-g21dc08a94f59-dirty #41
[20382.733293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[20382.735807] Call Trace:
[20382.736395]  <TASK>
[20382.736900]  dump_stack_lvl+0x37/0x50
[20382.737767]  ? snapshot_free+0x73/0x170 [blksnap]
[20382.738873]  kasan_report+0xb2/0xe0
[20382.739690]  ? snapshot_free+0x73/0x170 [blksnap]
[20382.740799]  snapshot_free+0x73/0x170 [blksnap]
[20382.741868]  snapshot_destroy+0x119/0x170 [blksnap]
[20382.743009]  ioctl_snapshot_destroy+0x7a/0xc0 [blksnap]
[20382.744241]  ? __pfx_ioctl_snapshot_destroy+0x10/0x10 [blksnap]
[20382.745606]  ? __fget_light+0x1ca/0x200
[20382.746493]  ctrl_unlocked_ioctl+0x3a/0x60 [blksnap]
[20382.747654]  __x64_sys_ioctl+0xc6/0xe0
[20382.748528]  do_syscall_64+0x47/0xa0
[20382.749369]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[20382.750524] RIP: 0033:0x7f27fbf7f4db
[20382.751351] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1b 48 8b 44 24 18 64 48 2b 04 25 28 00
[20382.760773] RSP: 002b:00007ffcf157de50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[20382.767031] RAX: ffffffffffffffda RBX: 00007ffcf157df00 RCX: 00007f27fbf7f4db
[20382.772597] RDX: 00007ffcf157dec0 RSI: 0000000080105602 RDI: 0000000000000003
[20382.777677] RBP: 00007ffcf157df60 R08: 00007ffcf157df30 R09: 0000000000000000
[20382.782318] R10: 00007f27fc18d8f0 R11: 0000000000000246 R12: 00007ffcf157e2f8
[20382.786617] R13: 00000000004079f6 R14: 00000000005653f8 R15: 00007f27fc19e040
[20382.790634]  </TASK>
[20382.794012] ==================================================================
[20382.797799] Disabling lock debugging due to kernel taint
[20382.801245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP KASAN PTI
[20382.805060] CPU: 4 PID: 8297 Comm: blksnap Tainted: G    B              6.3.0-rc5.mx64.428-00094-g21dc08a94f59-dirty #41
[20382.808757] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[20382.812441] RIP: 0010:snapshot_free+0x73/0x170 [blksnap]
[20382.815569] Code: 4d 8b 74 24 50 4c 89 f7 49 8d 6e f8 e8 56 a7 f3 e0 49 8b 1e 49 8d 7e 08 e8 4a a7 f3 e0 4d 8b 7e 08 48 8d 7b 08 e8 ed a7 f3 e0 <4c> 89 7b 08 4c 89 ff e8 e1 a7 f3 e0 49 89 1f 48 89 ef 48 b8 00 01
[20382.822394] RSP: 0018:ffff888120fafe18 EFLAGS: 00010292
[20382.825587] RAX: 0000000000000001 RBX: dead000000000100 RCX: ffffffff810cb82a
[20382.828866] RDX: fffffbfff0850d49 RSI: 0000000000000008 RDI: ffffffff84286a40
[20382.832142] RBP: ffff888105a4f400 R08: 0000000000000001 R09: ffffffff84286a47
[20382.835366] R10: fffffbfff0850d48 R11: 0000000000000001 R12: ffff888124ac9b10
[20382.838557] R13: ffff888124ac9b60 R14: ffff888105a4f408 R15: dead000000000122
[20382.841759] FS:  00007f27fbe7c780(0000) GS:ffff888261800000(0000) knlGS:0000000000000000
[20382.845049] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20382.848186] CR2: 00007f27fbed0d00 CR3: 0000000104608004 CR4: 0000000000170ee0
[20382.851439] Call Trace:
[20382.854366]  <TASK>
[20382.857219]  snapshot_destroy+0x119/0x170 [blksnap]
[20382.860286]  ioctl_snapshot_destroy+0x7a/0xc0 [blksnap]
[20382.863356]  ? __pfx_ioctl_snapshot_destroy+0x10/0x10 [blksnap]
[20382.866471]  ? __fget_light+0x1ca/0x200
[20382.869377]  ctrl_unlocked_ioctl+0x3a/0x60 [blksnap]
[20382.872322]  __x64_sys_ioctl+0xc6/0xe0
[20382.875124]  do_syscall_64+0x47/0xa0
[20382.877921]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[20382.880831] RIP: 0033:0x7f27fbf7f4db
[20382.883628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1b 48 8b 44 24 18 64 48 2b 04 25 28 00
[20382.890371] RSP: 002b:00007ffcf157de50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[20382.893694] RAX: ffffffffffffffda RBX: 00007ffcf157df00 RCX: 00007f27fbf7f4db
[20382.897004] RDX: 00007ffcf157dec0 RSI: 0000000080105602 RDI: 0000000000000003
[20382.900301] RBP: 00007ffcf157df60 R08: 00007ffcf157df30 R09: 0000000000000000
[20382.903572] R10: 00007f27fc18d8f0 R11: 0000000000000246 R12: 00007ffcf157e2f8
[20382.906879] R13: 00000000004079f6 R14: 00000000005653f8 R15: 00007f27fc19e040
[20382.910171]  </TASK>
[20382.913069] Modules linked in: blksnap rpcsec_gss_krb5 nfsv4 nfs 8021q garp stp mrp llc kvm_intel bochs drm_vram_helper drm_ttm_helper virtio_net kvm ttm net_failover drm_kms_helper input_leds led_class irqbypass drm failover crc32c_intel syscopyarea sysfillrect i2c_piix4 intel_agp sysimgblt intel_gtt floppy nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables unix ipv6 autofs4 [last unloaded: blksnap]
[20382.924858] ---[ end trace 0000000000000000 ]---
[20382.928229] RIP: 0010:snapshot_free+0x73/0x170 [blksnap]
[20382.931564] Code: 4d 8b 74 24 50 4c 89 f7 49 8d 6e f8 e8 56 a7 f3 e0 49 8b 1e 49 8d 7e 08 e8 4a a7 f3 e0 4d 8b 7e 08 48 8d 7b 08 e8 ed a7 f3 e0 <4c> 89 7b 08 4c 89 ff e8 e1 a7 f3 e0 49 89 1f 48 89 ef 48 b8 00 01
[20382.939104] RSP: 0018:ffff888120fafe18 EFLAGS: 00010292
[20382.942536] RAX: 0000000000000001 RBX: dead000000000100 RCX: ffffffff810cb82a
[20382.946224] RDX: fffffbfff0850d49 RSI: 0000000000000008 RDI: ffffffff84286a40
[20382.949927] RBP: ffff888105a4f400 R08: 0000000000000001 R09: ffffffff84286a47
[20382.953591] R10: fffffbfff0850d48 R11: 0000000000000001 R12: ffff888124ac9b10
[20382.957253] R13: ffff888124ac9b60 R14: ffff888105a4f408 R15: dead000000000122
[20382.960914] FS:  00007f27fbe7c780(0000) GS:ffff888261800000(0000) knlGS:0000000000000000
[20382.964666] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20382.968148] CR2: 00007f27fbed0d00 CR3: 0000000104608004 CR4: 0000000000170ee0


Best
   Donald
Sergei Shtepa April 20, 2023, 7:17 p.m. UTC | #9
On 4/20/23 16:44, Donald Buczek wrote:
> Subject:
> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
> From:
> Donald Buczek <buczek@molgen.mpg.de>
> Date:
> 4/20/23, 16:44
> 
> To:
> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
> CC:
> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
> 
> 
> On 4/19/23 21:42, Donald Buczek wrote:
>> Dear Sergei,
>>
>> On 4/19/23 15:05, Sergei Shtepa wrote:
>>> [...]
>>>
>>> Patches in attach and https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSergeiShtepa%2Flinux%2Ftree%2Fblksnap-master&data=05%7C01%7Csergei.shtepa%40veeam.com%7Cccc78e2cdf7845c6c0cd08db41add281%7Cba07baab431b49edadd7cbc3542f5140%7C1%7C0%7C638175987085694967%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RdrWqUwvk7gjfSRYvrPfz2E0%2BIOY6IQxK4xvpzJqcnk%3D&reserved=0
>>
>> Thanks. I can confirm that this fixes the reported problem and I no longer can trigger the UAF. 
Sergei Shtepa April 21, 2023, 5:32 p.m. UTC | #10
On 4/20/23 21:17, Sergei Shtepa wrote:
> Subject:
> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
> From:
> Sergei Shtepa <sergei.shtepa@veeam.com>
> Date:
> 4/20/23, 21:17
> 
> To:
> Donald Buczek <buczek@molgen.mpg.de>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
> CC:
> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
> 
> 
> 
> On 4/20/23 16:44, Donald Buczek wrote:
>> Subject:
>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>> From:
>> Donald Buczek <buczek@molgen.mpg.de>
>> Date:
>> 4/20/23, 16:44
>>
>> To:
>> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>> CC:
>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>
>>
>> On 4/19/23 21:42, Donald Buczek wrote:
>>> Dear Sergei,
>>>
>>> On 4/19/23 15:05, Sergei Shtepa wrote:
>>>> [...]
>>>>
>>>> Patches in attach and https://github.com/SergeiShtepa/linux/tree/blksnap-master
>>> Thanks. I can confirm that this fixes the reported problem and I no longer can trigger the UAF. 
Donald Buczek April 22, 2023, 8:01 p.m. UTC | #11
On 4/21/23 19:32, Sergei Shtepa wrote:
> 
> 
> On 4/20/23 21:17, Sergei Shtepa wrote:
>> Subject:
>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>> From:
>> Sergei Shtepa <sergei.shtepa@veeam.com>
>> Date:
>> 4/20/23, 21:17
>>
>> To:
>> Donald Buczek <buczek@molgen.mpg.de>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>> CC:
>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>
>>
>>
>> On 4/20/23 16:44, Donald Buczek wrote:
>>> Subject:
>>> Re: [PATCH v3 03/11] documentation: Block Devices Snapshots Module
>>> From:
>>> Donald Buczek <buczek@molgen.mpg.de>
>>> Date:
>>> 4/20/23, 16:44
>>>
>>> To:
>>> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org
>>> CC:
>>> viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, kch@nvidia.com, martin.petersen@oracle.com, vkoul@kernel.org, ming.lei@redhat.com, gregkh@linuxfoundation.org, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
>>>
>>>
>>> On 4/19/23 21:42, Donald Buczek wrote:
>>>> Dear Sergei,
>>>>
>>>> On 4/19/23 15:05, Sergei Shtepa wrote:
>>>>> [...]
>>>>>
>>>>> Patches in attach and https://github.com/SergeiShtepa/linux/tree/blksnap-master
>>>> Thanks. I can confirm that this fixes the reported problem and I no longer can trigger the UAF. 
diff mbox series

Patch

diff --git a/Documentation/block/blksnap.rst b/Documentation/block/blksnap.rst
new file mode 100644
index 000000000000..7752f33809bb
--- /dev/null
+++ b/Documentation/block/blksnap.rst
@@ -0,0 +1,345 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+Block Devices Snapshots Module (blksnap)
+========================================
+
+Introduction
+============
+
+At first glance, there is no novelty in the idea of creating snapshots for
+block devices. The Linux kernel already has mechanisms for creating snapshots.
+Device Mapper includes dm-snap, which allows to create snapshots of block
+devices. BTRFS supports snapshots at the file system level. However, both
+of these options have flaws that do not allow to use them as a universal
+tool for creating backups.
+
+The main properties that a backup tool should have are:
+
+- Simplicity and versatility of use
+- Reliability
+- Minimal consumption of system resources during backup
+- Minimal time required for recovery or replication of the entire system
+
+Therefore, the features of the blksnap module are:
+
+- Change tracker
+- Snapshots at the block device level
+- Dynamic allocation of space for storing differences
+- Snapshot overflow resistance
+- Coherent snapshot of multiple block devices
+
+Features
+========
+
+Change tracker
+--------------
+
+The change tracker allows to determine which blocks were changed during the
+time between the last snapshot created and any of the previous snapshots.
+Having a map of changes, it is enough to copy only the changed blocks, and
+no need to reread the entire block device completely. The change tracker
+allows to implement the logic of both incremental and differential backups.
+Incremental backup is critical for large file repositories whose size can be
+hundreds of terabytes and whose full backup time can take more than a day.
+On such servers, the use of backup tools without a change tracker becomes
+practically impossible.
+
+Snapshot at the block device level
+----------------------------------
+
+A snapshot at the block device level allows to simplify the backup algorithm
+and reduce consumption of system resources. It also allows to perform linear
+reading of disk space directly, which allows to achieve maximum reading speed
+with minimal use of processor time. At the same time, the versatility of
+creating snapshots for any block device is achieved, regardless of the file
+system located on it. The exceptions are BTRFS, ZFS and cluster file systems.
+
+Dynamic allocation of storage space for differences
+---------------------------------------------------
+
+To store differences, the module does not require a pre-reserved block
+device range. A range of sectors can be allocated on any block device
+immediately before creating a snapshot in individual files on the file
+system. In addition, the size of the difference storage can be increased
+after the snapshot is created by adding new sector ranges on block devices.
+Sector ranges can be allocated on any block devices of the system, including
+those on which the snapshot was created. A shared difference storage for
+all images of snapshot block devices allows to optimize the use of disk space.
+
+Snapshot overflow resistance
+----------------------------
+
+To create images of snapshots of block devices, the module stores blocks
+of the original block device that have been changed since the snapshot
+was taken. To do this, the module handles write requests and reads blocks
+that need to be overwritten. This algorithm guarantees safety of the data
+of the original block device in the event of an overflow of the snapshot,
+and even in the case of unpredictable critical errors. If a problem occurs
+during backup, the difference storage is released, the snapshot is closed,
+no backup is created, but the server continues to work.
+
+Coherent snapshot of multiple block devices
+-------------------------------------------
+
+A snapshot is created simultaneously for all block devices for which a backup
+is being created, ensuring their coherent state.
+
+
+Algorithms
+==========
+
+Overview
+--------
+
+The blksnap module is a block-level filter. It handles all write I/O units.
+The filter is attached to the block device when the snapshot is created
+for the first time. The change tracker marks all overwritten blocks.
+Information about the history of changes on the block device is available
+while holding the snapshot. The module reads the blocks that need to be
+overwritten and stores them in the difference storage. When reading from
+a snapshot image, reading is performed either from the original device or
+from the difference storage.
+
+Change tracking
+---------------
+
+A change tracker map is created for each block device. One byte
+of this map corresponds to one block. The block size is set by the
+``tracking_block_minimum_shift`` and ``tracking_block_maximum_count``
+module parameters. The ``tracking_block_minimum_shift`` parameter limits
+the minimum block size for tracking, while ``tracking_block_maximum_count``
+defines the maximum allowed number of blocks. The size of the change tracker
+block is determined depending on the size of the block device when adding
+a tracking device, that is, when the snapshot is taken for the first time.
+The block size must be a power of two. The ``tracking_block_maximum_shift``
+module parameter allows to limit the maximum block size for tracking. If the
+block size reaches the allowable limit, the number of blocks will exceed the
+``tracking_block_maximum_count`` parameter.
+
+The byte of the change map stores a number from 0 to 255. This is the
+snapshot number, since the creation of which there have been changes in
+the block. Each time a snapshot is created, the number of the current
+snapshot is increased by one. This number is written to the cell of the
+change map when writing to the block. Thus, knowing the number of one of
+the previous snapshots and the number of the last snapshot, one can determine
+from the change map which blocks have been changed. When the number of the
+current change reaches the maximum allowed value for the map of 255, at the
+time when the next snapshot is created, the map of changes is reset to zero,
+and the number of the current snapshot is assigned the value 1. The change
+tracker is reset, and a new UUID is generated - a unique identifier of the
+snapshot generation. The snapshot generation identifier allows to identify
+that a change tracking reset has been performed.
+
+The change map has two copies. One copy is active, it tracks the current
+changes on the block device. The second copy is available for reading
+while the snapshot is being held, and contains the history up to the moment
+the snapshot is taken. Copies are synchronized at the moment of snapshot
+creation. After the snapshot is released, a second copy of the map is not
+needed, but it is not released, so as not to allocate memory for it again
+the next time the snapshot is created.
+
+Copy on write
+-------------
+
+Data is copied in blocks, or rather in chunks. The term "chunk" is used to
+avoid confusion with change tracker blocks and I/O blocks. In addition,
+the "chunk" in the blksnap module means about the same as the "chunk" in
+the dm-snap module.
+
+The size of the chunk is determined by the ``chunk_minimum_shift`` and
+``chunk_maximum_count`` module parameters. The ``chunk_minimum_shift``
+parameter limits the minimum size of the chunk, while ``chunk_maximum_count``
+defines the maximum allowed number of chunks. The size of the chunk is
+determined depending on the size of the block device at the time of taking the
+snapshot. The size of the chunk must be a power of two. The module parameter
+``chunk_maximum_shift`` allows to limit the maximum chunk size. If the chunk
+size reaches the allowable limit, the number of chunks will exceed the
+``chunk_maximum_count`` parameter.
+
+One chunk is described by the ``struct chunk`` structure. An array of structures
+is created for each block device. The structure contains all the necessary
+information to copy the chunks data from the original block device to the
+difference storage. This information allows to describe the snapshot image.
+A semaphore is located in the structure, which allows synchronization of threads
+accessing the chunk.
+
+The block level has a feature. If a read I/O unit was sent, and a write I/O
+unit was sent after it, then a write can be performed first, and only then
+a read. Therefore, the copy-on-write algorithm is executed synchronously.
+If a write request is handled, the execution of this I/O unit will be
+delayed until the overwritten chunks are copied to the difference storage.
+But if, when handling a write I/O unit, it turns out that the recorded range
+of sectors has already been copied to the difference storage, then the I/O
+unit is simply passed.
+
+This algorithm allows to efficiently perform backups of systems that run
+Round Robin Database. Such databases can be overwritten several times during
+the system backup. Of course, the value of a backup of the RRD monitoring
+system data can be questioned. However, it is often a task to make a backup
+of the entire enterprise infrastructure in order to restore or replicate it
+entirely in case of problems.
+
+There is also a flaw in the algorithm. When overwriting at least one sector,
+an entire chunk is copied. Thus, a situation of rapid filling of the difference
+storage when writing data to a block device in small portions in random order
+is possible. This situation is possible in case of strong fragmentation of
+data on the file system. But it must be borne in mind that with such data
+fragmentation, performance of systems usually degrades greatly. So, this
+problem does not occur on real servers, although it can easily be created
+by artificial tests.
+
+Difference storage
+------------------
+
+The difference storage is a pool of disk space areas, and it is shared with
+all block devices in the snapshot. Therefore, there is no need to divide
+the difference storage area between block devices, and the difference storage
+itself can be located on different block devices.
+
+There is no need to allocate a large disk space immediately before creating
+a snapshot. Even while the snapshot is being held, the difference storage
+can be expanded. It is enough to have free space on the file system.
+
+Areas of disk space can be allocated on the file system using fallocate(),
+and the file location can be requested using Fiemap Ioctl or Fibmap Ioctl.
+Unfortunately, not all file systems support these mechanisms, but the most
+common XFS, EXT4 and BTRFS file systems support it. BTRFS requires additional
+conversion of virtual offsets to physical ones.
+
+While holding the snapshot, the user process can poll the status of the module.
+When free space in the difference storage is reduced to a threshold value, the
+module generates an event about it. The user process can prepare a new area
+and pass it to the module to expand the difference storage. The threshold
+value is determined as half of the value of the ``diff_storage_minimum``
+module parameter.
+
+If free space in the difference storage runs out, an event is generated about
+the overflow of the snapshot. Such a snapshot is considered corrupted, and
+read I/O units to snapshot images will be terminated with an error code.
+The difference storage stores outdated data required for snapshot images,
+so when the snapshot is overflowed, the backup process is interrupted,
+but the system maintains its operability without data loss.
+
+Performing I/O for a snapshot image
+-----------------------------------
+
+To read snapshot data, when taking a snapshot, block devices of snapshot images
+are created. The snapshot image block devices support the write operation.
+This allows to perform additional data preparation on the file system before
+creating a backup.
+
+To process the I/O unit, clones of the I/O unit are created, which redirect
+the I/O unit either to the original block device or to the difference storage.
+When processing of cloned I/O units is completed, the original I/O unit is
+marked as completed too.
+
+An I/O unit can be partially processed without accessing to block devices if
+the I/O unit refers to a chunk that is in the queue for storing to the
+difference storage. In this case, the data is read or written in a buffer in
+memory.
+
+If, when processing the write I/O unit, it turns out that the data of the
+referred chunk has not yet been stored to the difference storage or has not
+even been read from the original device, then an I/O unit to read data from the
+original device is initiated beforehand. After the reading from original device
+is performed, their data from the I/O unit is partially overwritten directly in
+the buffer of the chunk in memory, and the chunk is scheduled to be saved to the
+difference storage.
+
+How to use
+==========
+
+Depending on the needs and the selected license, you can choose different
+options for managing the module:
+
+- Using ioctl directly
+- Using a static C++ library
+- Using the blksnap console tool
+
+Using a BLKFILTER_CTL for block device
+--------------------------------------
+
+BLKFILTER_CTL allows to send a filter-specific command to the filter on block
+device and get the result of its execution. The module provides the
+``include/uapi/blksnap.h`` header file with a description of the commands and
+their data structures.
+
+1. ``blkfilter_ctl_blksnap_cbtinfo`` allows to get information from the
+   change tracker.
+2. ``blkfilter_ctl_blksnap_cbtmap`` reads the change tracker table. If a write
+   operation was performed for the snapshot, then the change tracker takes this
+   into account. Therefore, it is necessary to receive tracker data after write
+   operations have been completed.
+3. ``blkfilter_ctl_blksnap_cbtdirty`` mark blocks as changed in the change
+   tracker table. This is necessary if post-processing is performed after the
+   backup is created, which changes the backup blocks.
+4. ``blkfilter_ctl_blksnap_snapshotadd`` adds a block device to the snapshot.
+5. ``blkfilter_ctl_blksnap_snapshotinfo`` allows to get the name of the snapshot
+   image block device and the presence of an error.
+
+Using ioctl
+-----------
+
+Using a BLKFILTER_CTL ioctl does not allow to fully implement the management of
+the blksnap module. A control file ``blksnap-control`` is created to manage
+snapshots. The control commands are also described in the file
+``include/uapi/blksnap.h``.
+
+1. ``blksnap_ioctl_version`` get the version number.
+2. ``blk_snap_ioctl_snapshot_create`` initiates the snapshot creation process.
+3. ``blk_snap_ioctl_snapshot_append_storage`` add the range of blocks to
+   difference storage.
+4. ``blk_snap_ioctl_snapshot_take`` creates block devices of block device
+   snapshot images.
+5. ``blk_snap_ioctl_snapshot_collect`` collect all created snapshots.
+6. ``blk_snap_ioctl_snapshot_wait_event`` allows to track the status of
+   snapshots and receive events about the requirement to expand the difference
+   storage or about snapshot overflow.
+7. ``blk_snap_ioctl_snapshot_destroy`` releases the snapshot.
+
+Static C++ library
+------------------
+
+The [#userspace_libs]_ library was created primarily to simplify creation of
+tests in C++, and it is also a good example of using the module interface.
+When creating applications, direct use of control calls is preferable.
+However, the library can be used in an application with a GPL-2+ license,
+or a library with an LGPL-2+ license can be created, with which even a
+proprietary application can be dynamically linked.
+
+blksnap console tool
+--------------------
+
+The blksnap [#userspace_tools]_ console tool allows to control the module
+from the command line. The tool contains detailed built-in help. To get
+the list of commands, enter the ``blksnap --help`` command. The ``blksnap
+<command name> --help`` command allows to get detailed information about the
+parameters of each command call. This option may be convenient when creating
+proprietary software, as it allows not to compile with the open source code.
+At the same time, the blksnap tool can be used for creating backup scripts.
+For example, rsync can be called to synchronize files on the file system of
+the mounted snapshot image and files in the archive on a file system that
+supports compression.
+
+Tests
+-----
+
+A set of tests was created for regression testing [#userspace_tests]_.
+Tests with simple algorithms that use the ``blksnap`` console tool to
+control the module are written in Bash. More complex testing algorithms
+are implemented in C++.
+
+References
+==========
+
+.. [#userspace_libs] https://github.com/veeam/blksnap/tree/stable-v2.0/lib
+
+.. [#userspace_tools] https://github.com/veeam/blksnap/tree/stable-v2.0/tools
+
+.. [#userspace_tests] https://github.com/veeam/blksnap/tree/stable-v2.0/tests
+
+Module interface description
+============================
+
+.. kernel-doc:: include/uapi/linux/blksnap.h
diff --git a/Documentation/block/index.rst b/Documentation/block/index.rst
index e56d89db7b85..34937516c865 100644
--- a/Documentation/block/index.rst
+++ b/Documentation/block/index.rst
@@ -11,6 +11,7 @@  Block
    biovecs
    blk-mq
    blkfilter
+   blksnap
    cmdline-partition
    data-integrity
    deadline-iosched
diff --git a/MAINTAINERS b/MAINTAINERS
index fb6b7abe83e1..4bdb30369a74 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3580,6 +3580,12 @@  F:	block/blk-filter.c
 F:	include/linux/blk-filter.h
 F:	include/uapi/linux/blk-filter.h
 
+BLOCK DEVICE SNAPSHOTS MODULE
+M:	Sergei Shtepa <sergei.shtepa@veeam.com>
+L:	linux-block@vger.kernel.org
+S:	Supported
+F:	Documentation/block/blksnap.rst
+
 BLOCK LAYER
 M:	Jens Axboe <axboe@kernel.dk>
 L:	linux-block@vger.kernel.org