diff mbox series

announcing the dm-update target

Message ID alpine.LRH.2.02.2111231421490.24195@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State New, archived
Headers show
Series announcing the dm-update target | expand

Commit Message

Mikulas Patocka Nov. 23, 2021, 9:02 p.m. UTC
Hi

I announce the first version of the dm-update target. This target will 
perform background update of read-only system partition on embedded 
systems while the system is running.

It performs similar functionality that is described in this presentation: 
https://linuxplumbersconf.org/event/11/contributions/1049/attachments/826/1562/2021%20LPC_%20dm-snapshot%20in%20user%20space.pdf 
(except that dm-update works entirely in kernel space)

See the file Documentation/admin-guide/device-mapper/update.rst at the end 
of this patch for instructions how to use it.

I'd like to ask Akilesh Kailash and David Anderson if they are interested 
in using this in Android. I am willing to extend the update target, so 
that it will handle Android update format.

Mikulas



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Comments

Akilesh Kailash Nov. 23, 2021, 10:27 p.m. UTC | #1
On Tue, Nov 23, 2021 at 1:03 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> I announce the first version of the dm-update target. This target will
> perform background update of read-only system partition on embedded
> systems while the system is running.
>
> It performs similar functionality that is described in this presentation:
> https://linuxplumbersconf.org/event/11/contributions/1049/attachments/826/1562/2021%20LPC_%20dm-snapshot%20in%20user%20space.pdf
> (except that dm-update works entirely in kernel space)
>
> See the file Documentation/admin-guide/device-mapper/update.rst at the end
> of this patch for instructions how to use it.
>
> I'd like to ask Akilesh Kailash and David Anderson if they are interested
> in using this in Android. I am willing to extend the update target, so
> that it will handle Android update format.
>
> Mikulas

We made a conscious decision to not have this in the kernel. There are
few reasons:

Older devices do not get new kernel features, only LTS updates. On the
other hand, by having
it in user-space, we have more control by adding in new features. For
ex: On Android S, we
introduced the first version of Android COW format. Now in Android T,
we plan to support
new features by having XOR compression (which saves more space).

Additionally, the android COW format is a little more complex than
just replacing
every block. It is not that the kernel cannot do it. The question is
more about why the in-kernel design
will make it hard for OTA updates.



>
> Index: linux-2.6/drivers/md/Kconfig
> ===================================================================
> --- linux-2.6.orig/drivers/md/Kconfig
> +++ linux-2.6/drivers/md/Kconfig
> @@ -652,4 +652,14 @@ config DM_AUDIT
>           Enables audit logging of several security relevant events in the
>           particular device-mapper targets, especially the integrity target.
>
> +config DM_UPDATE
> +       tristate "Update target support"
> +       depends on BLK_DEV_DM
> +       select DM_BUFIO
> +       select CRYPTO
> +       help
> +         The dm-update target allows transparent updates for embedded devices
> +
> +         If unsure, say N.
> +
>  endif # MD
> Index: linux-2.6/drivers/md/Makefile
> ===================================================================
> --- linux-2.6.orig/drivers/md/Makefile
> +++ linux-2.6/drivers/md/Makefile
> @@ -83,6 +83,7 @@ obj-$(CONFIG_DM_LOG_WRITES)   += dm-log-wr
>  obj-$(CONFIG_DM_INTEGRITY)     += dm-integrity.o
>  obj-$(CONFIG_DM_ZONED)         += dm-zoned.o
>  obj-$(CONFIG_DM_WRITECACHE)    += dm-writecache.o
> +obj-$(CONFIG_DM_UPDATE)                += dm-update.o
>
>  ifeq ($(CONFIG_DM_INIT),y)
>  dm-mod-objs                    += dm-init.o
> Index: linux-2.6/drivers/md/dm-update.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/drivers/md/dm-update.c
> @@ -0,0 +1,751 @@
> +#include <linux/device-mapper.h>
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/dm-io.h>
> +#include <linux/crypto.h>
> +
> +#include "dm-update.h"
> +
> +#define DM_MSG_PREFIX "update"
> +
> +#define N_BUFFERS              16
> +
> +#define B_EMPTY                        0
> +#define B_LOADING              1
> +#define B_VALID                        2
> +
> +struct dm_update_buffer {
> +       int status;
> +       char *compressed_chunk;
> +       char *decompressed_chunk;
> +       uint64_t src;
> +       struct update_entry *e;
> +       struct bio_list waiting_bios;
> +       struct work_struct work;
> +       struct dm_update *u;
> +};
> +
> +struct dm_update {
> +       struct dm_dev *system_dev;
> +       struct dm_dev *update_dev;
> +       struct dm_target *ti;
> +       struct dm_io_client *dm_io;
> +       unsigned update_lbs;
> +       unsigned char update_lbs_bits;
> +       struct update_superblock *sb;
> +       struct crypto_comp *cc;
> +       struct update_entry *entries;
> +
> +       struct mutex mutex;
> +       struct workqueue_struct *decompress_wq;
> +       struct bio_list waiting_bios;
> +
> +       bool no_bg_update;
> +       struct workqueue_struct *bg_wq;
> +       struct work_struct bg_work;
> +       char *bg_compressed_chunk;
> +       char *bg_decompressed_chunk;
> +       size_t bg_index;
> +
> +       unsigned buffer_replacement;
> +       struct dm_update_buffer buffer[N_BUFFERS];
> +};
> +
> +static int update_rw(struct dm_update *u, bool system_dev, int req_op, sector_t sector, sector_t n_sectors, void *ptr)
> +{
> +       struct dm_io_region region;
> +       struct dm_io_request req;
> +
> +       region.bdev = system_dev ? u->system_dev->bdev : u->update_dev->bdev;
> +       region.sector = sector;
> +       region.count = n_sectors;
> +
> +       req.bi_op = req_op;
> +       req.bi_op_flags = REQ_SYNC;
> +       req.mem.type = DM_IO_VMA;
> +       req.mem.ptr.vma = ptr;
> +       req.client = u->dm_io;
> +       req.notify.fn = NULL;
> +       req.notify.context = NULL;
> +
> +       return dm_io(&req, 1, &region, NULL);
> +}
> +
> +static int update_decompress(struct dm_update *u, void *src, size_t src_size, void *dst, size_t dst_size)
> +{
> +       int r;
> +       if (!u->cc) {
> +               if (unlikely(src_size > dst_size))
> +                       return -EOVERFLOW;
> +               memcpy(dst, src, src_size);
> +       } else {
> +               unsigned dst_int;
> +               if (unlikely(src_size != (unsigned)src_size) ||
> +                   unlikely(dst_size != (unsigned)dst_size))
> +                       return -EOVERFLOW;
> +               dst_int = dst_size;
> +               r = crypto_comp_decompress(u->cc, src, src_size, dst, &dst_int);
> +               if (unlikely(r))
> +                       return r;
> +       }
> +       return 0;
> +}
> +
> +static void update_fill_from_buffer(struct dm_update *u, struct dm_update_buffer *b, struct bio *bio)
> +{
> +       struct bio_vec bv;
> +       struct bvec_iter iter;
> +
> +       struct update_entry *e = bio->bi_private;
> +       size_t data_offset = (size_t)le32_to_cpu(e->offset) << u->sb->block_bits;
> +       size_t bio_offset = (bio->bi_iter.bi_sector & ((1 << (u->sb->block_bits - SECTOR_SHIFT)) - 1)) << SECTOR_SHIFT;
> +       const char *data = b->decompressed_chunk + data_offset + bio_offset;
> +
> +       bio_for_each_segment(bv, bio, iter) {
> +               char *addr = kmap_local_page(bv.bv_page);
> +               memcpy(addr + bv.bv_offset, data, bv.bv_len);
> +               flush_dcache_page(bv.bv_page);
> +               kunmap_local(addr);
> +               data += bv.bv_len;
> +       }
> +
> +       bio_endio(bio);
> +}
> +
> +static void update_process_bio(struct dm_update *u, struct bio *bio)
> +{
> +       struct update_entry *e;
> +       struct dm_update_buffer *b;
> +       uint64_t src;
> +       int i;
> +
> +       e = bio->bi_private;
> +
> +       src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               b = &u->buffer[i];
> +               if (b->status == B_EMPTY)
> +                       continue;
> +               if (b->src == src) {
> +                       if (b->status == B_LOADING) {
> +                               bio_list_add(&b->waiting_bios, bio);
> +                       } else {
> +                               update_fill_from_buffer(u, b, bio);
> +                       }
> +                       return;
> +               }
> +       }
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               b = &u->buffer[i];
> +               if (b->status == B_EMPTY) {
> +replace_buffer:
> +                       bio_list_add(&b->waiting_bios, bio);
> +                       b->status = B_LOADING;
> +                       b->src = src;
> +                       b->e = e;
> +                       queue_work(u->decompress_wq, &b->work);
> +                       return;
> +               }
> +       }
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               b = &u->buffer[u->buffer_replacement];
> +               u->buffer_replacement = (u->buffer_replacement + 1) % N_BUFFERS;
> +               if (b->status == B_VALID)
> +                       goto replace_buffer;
> +       }
> +       bio_list_add(&u->waiting_bios, bio);
> +}
> +
> +static void dm_update_get_locations(struct dm_update *u, struct update_entry *e, uint64_t *src, sector_t *sector, sector_t *n_sectors, size_t *front_pad, size_t *compressed_length)
> +{
> +       uint64_t next_src;
> +       *src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +       do {
> +               e++;
> +               next_src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +       } while (next_src == *src);
> +
> +       *compressed_length = next_src - *src;
> +       *front_pad = *src & (u->update_lbs - 1);
> +       *sector = *src >> u->update_lbs_bits << (u->update_lbs_bits - SECTOR_SHIFT);
> +       *n_sectors = round_up(*front_pad + *compressed_length, u->update_lbs) >> SECTOR_SHIFT;
> +}
> +
> +static void dm_update_buffer_work(struct work_struct *w)
> +{
> +       struct dm_update_buffer *b = container_of(w, struct dm_update_buffer, work);
> +       struct dm_update *u = b->u;
> +       uint64_t src;
> +       size_t front_pad, compressed_length;
> +       sector_t sector, n_sectors;
> +       struct bio *bio, *waiting_bios;
> +       int r;
> +
> +       dm_update_get_locations(u, b->e, &src, &sector, &n_sectors, &front_pad, &compressed_length);
> +
> +       r = update_rw(u, false, REQ_OP_READ, sector, n_sectors, b->compressed_chunk);
> +       if (unlikely(r))
> +               goto io_error;
> +
> +       r = update_decompress(u, b->compressed_chunk + front_pad, compressed_length, b->decompressed_chunk, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +       if (unlikely(r))
> +               goto io_error;
> +
> +io_error:
> +       mutex_lock(&u->mutex);
> +       b->status = likely(!r) ? B_VALID : B_EMPTY;
> +       while ((bio = bio_list_pop(&b->waiting_bios))) {
> +               if (unlikely(r)) {
> +                       bio->bi_status = errno_to_blk_status(r);
> +                       bio_endio(bio);
> +               } else {
> +                       update_fill_from_buffer(u, b, bio);
> +               }
> +       }
> +
> +       waiting_bios = bio_list_get(&u->waiting_bios);
> +       while (waiting_bios != NULL) {
> +               bio = waiting_bios;
> +               waiting_bios = bio->bi_next;
> +               bio->bi_next = NULL;
> +               update_process_bio(u, bio);
> +       }
> +
> +       mutex_unlock(&u->mutex);
> +}
> +
> +static int update_map(struct dm_target *ti, struct bio *bio)
> +{
> +       struct dm_update *u = ti->private;
> +       sector_t block;
> +       size_t first, last, half;
> +       struct update_entry *e;
> +
> +       if (bio_data_dir(bio) == WRITE)
> +               return DM_MAPIO_KILL;
> +
> +       block = bio->bi_iter.bi_sector >> (u->sb->block_bits - SECTOR_SHIFT);
> +
> +       first = 0;
> +       last = le64_to_cpu(u->sb->dir_n) - 1;
> +       while (first < last) {
> +               sector_t test_block;
> +               half = first / 2 + last / 2 + (first & last & 1);
> +               e = &u->entries[half];
> +               test_block = le32_to_cpu(e->dest_lo) + ((uint64_t)le16_to_cpu(e->dest_hi) << 32);
> +               if (test_block == block)
> +                       goto found;
> +               if (test_block < block) {
> +                       first = half + 1;
> +               } else {
> +                       last = half;
> +               }
> +       }
> +
> +       bio_set_dev(bio, u->system_dev->bdev);
> +       return DM_MAPIO_REMAPPED;
> +
> +found:
> +       bio->bi_private = e;
> +
> +       mutex_lock(&u->mutex);
> +       update_process_bio(u, bio);
> +       mutex_unlock(&u->mutex);
> +
> +       return DM_MAPIO_SUBMITTED;
> +}
> +
> +static void update_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen)
> +{
> +       struct dm_update *u = ti->private;
> +       unsigned extra_args;
> +       unsigned sz = 0;
> +
> +       switch (type) {
> +               case STATUSTYPE_INFO:
> +                       DMEMIT("%zu %llu", READ_ONCE(u->bg_index), (unsigned long long)(le64_to_cpu(u->sb->dir_n) - 1));
> +                       break;
> +               case STATUSTYPE_TABLE:
> +                       DMEMIT("%s %s ", u->system_dev->name, u->update_dev->name);
> +
> +                       extra_args = 0;
> +                       if (u->no_bg_update)
> +                               extra_args++;
> +
> +                       DMEMIT("%u", extra_args);
> +                       if (u->no_bg_update)
> +                               DMEMIT(" no_bg_update");
> +                       break;
> +               case STATUSTYPE_IMA:
> +                       DMEMIT_TARGET_NAME_VERSION(ti->type);
> +                       DMEMIT(",update_system_device=%s", u->system_dev->name);
> +                       DMEMIT(",update_update_device=%s", u->update_dev->name);
> +                       DMEMIT(";");
> +                       break;
> +       }
> +}
> +
> +static int update_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data)
> +{
> +       struct dm_update *u = ti->private;
> +
> +       return fn(ti, u->system_dev, 0, ti->len, data);
> +}
> +
> +static void update_background_work(struct work_struct *w)
> +{
> +       struct dm_update *u = container_of(w, struct dm_update, bg_work);
> +       uint64_t src;
> +       size_t front_pad, compressed_length;
> +       sector_t sector, n_sectors;
> +       int r;
> +
> +       if (u->bg_index >= le64_to_cpu(u->sb->dir_n) - 1)
> +               return;
> +
> +       dm_update_get_locations(u, &u->entries[u->bg_index], &src, &sector, &n_sectors, &front_pad, &compressed_length);
> +
> +       r = update_rw(u, false, REQ_OP_READ, sector, n_sectors, u->bg_compressed_chunk);
> +       if (unlikely(r)) {
> +               DMERR("error reading update device (%d), aborting backgroup update", r);
> +               return;
> +       }
> +
> +       r = update_decompress(u, u->bg_compressed_chunk + front_pad, compressed_length, u->bg_decompressed_chunk, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +       if (unlikely(r)) {
> +               DMERR("error decompressing update data (%d), aborting backgroup update", r);
> +               return;
> +       }
> +
> +       while (u->bg_index < le64_to_cpu(u->sb->dir_n) - 1) {
> +               uint64_t s, dest;
> +               size_t offset;
> +               struct update_entry *e = &u->entries[u->bg_index];
> +               s = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +               if (s != src)
> +                       break;
> +
> +               dest = le32_to_cpu(e->dest_lo) + ((uint64_t)le16_to_cpu(e->dest_hi) << 32);
> +               offset = (size_t)le32_to_cpu(e->offset) << u->sb->block_bits;
> +
> +               r = update_rw(u, true, REQ_OP_WRITE, dest << (u->sb->block_bits - SECTOR_SHIFT), 1UL << (u->sb->block_bits - SECTOR_SHIFT), u->bg_decompressed_chunk + offset);
> +               if (unlikely(r)) {
> +                       DMERR("error writing to the system device (%d), aborting backgroup update", r);
> +                       return;
> +               }
> +
> +               if (unlikely(u->bg_index == le64_to_cpu(u->sb->dir_n) - 2)) {
> +                       r = update_rw(u, true, REQ_OP_WRITE | REQ_PREFLUSH, 0, 0, NULL);
> +                       if (unlikely(r)) {
> +                               DMERR("error synchronizing the system device (%d), aborting backgroup update", r);
> +                               return;
> +                       }
> +               }
> +
> +               WRITE_ONCE(u->bg_index, u->bg_index + 1);
> +       }
> +
> +       queue_work(u->bg_wq, &u->bg_work);
> +}
> +
> +static void update_presuspend(struct dm_target *ti)
> +{
> +       struct dm_update *u = ti->private;
> +       cancel_work_sync(&u->bg_work);
> +}
> +
> +static void update_resume(struct dm_target *ti)
> +{
> +       struct dm_update *u = ti->private;
> +       if (u->no_bg_update)
> +               return;
> +       queue_work(u->bg_wq, &u->bg_work);
> +}
> +
> +static void update_dtr(struct dm_target *ti)
> +{
> +       struct dm_update *u = ti->private;
> +       int i;
> +
> +       if (u->bg_wq)
> +               destroy_workqueue(u->bg_wq);
> +       if (u->decompress_wq)
> +               destroy_workqueue(u->decompress_wq);
> +
> +       vfree(u->bg_compressed_chunk);
> +       vfree(u->bg_decompressed_chunk);
> +
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               vfree(u->buffer[i].compressed_chunk);
> +               vfree(u->buffer[i].decompressed_chunk);
> +       }
> +       vfree(u->sb);
> +       vfree(u->entries);
> +       if (u->dm_io)
> +               dm_io_client_destroy(u->dm_io);
> +       if (u->system_dev)
> +               dm_put_device(ti, u->system_dev);
> +       if (u->update_dev)
> +               dm_put_device(ti, u->update_dev);
> +       if (u->cc)
> +               crypto_free_comp(u->cc);
> +
> +       mutex_init(&u->mutex);
> +
> +       kfree(u);
> +}
> +
> +static char *validate_sb(struct dm_update *u)
> +{
> +       struct update_superblock *sb = u->sb;
> +
> +       if (sb->block_bits < SECTOR_SHIFT || sb->block_bits >= 32)
> +               return "Invalid superblock: block_bits";
> +
> +       if (!le32_to_cpu(sb->chunk_blocks))
> +               return "Invalid superblock: chunk_blocks is zero";
> +
> +       if (le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits >> u->sb->block_bits != le32_to_cpu(u->sb->chunk_blocks))
> +               return "Invalid superblock: too large chunk_blocks";
> +
> +       if ((int)(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits) < 0)
> +               return "Invalid superblock: too large chunk_blocks";
> +
> +       if (le64_to_cpu(sb->dir_n) < 1)
> +               return "Invalid superblock: zero dir_n";
> +
> +       if ((size_t)le64_to_cpu(sb->dir_n) * sizeof(struct update_entry) / sizeof(struct update_entry) != le64_to_cpu(sb->dir_n))
> +               return "Invalid superblock: overflow in dir_n";
> +
> +       return NULL;
> +}
> +
> +static char *validate_metadata(struct dm_update *u)
> +{
> +       struct update_superblock *sb = u->sb;
> +       size_t i;
> +       size_t n = le64_to_cpu(sb->dir_n) - 1;
> +
> +       for (i = 0; i < n; i++) {
> +               struct update_entry *e1 = &u->entries[i];
> +               struct update_entry *e2 = &u->entries[i + 1];
> +               uint64_t dest1, dest2;
> +
> +               if (le32_to_cpu(e1->offset) >= le32_to_cpu(sb->chunk_blocks))
> +                       return "Invalid metadata: offset is too high";
> +
> +               dest1 = le32_to_cpu(e1->dest_lo) + ((uint64_t)le16_to_cpu(e1->dest_hi) << 32);
> +               dest2 = le32_to_cpu(e2->dest_lo) + ((uint64_t)le16_to_cpu(e2->dest_hi) << 32);
> +
> +               if (dest1 >= dest2)
> +                       return "Invalid metadata: destination is not monotonic";
> +       }
> +
> +       return NULL;
> +}
> +
> +static int update_ctr(struct dm_target *ti, unsigned argc, char **argv)
> +{
> +       struct dm_update *u;
> +       struct dm_arg_set as;
> +       const char *string;
> +       unsigned opt_params;
> +       char *err;
> +       int r;
> +       uint64_t compressed_dir_size;
> +       void *compressed_dir = NULL;
> +       size_t dst_len;
> +       int i;
> +       size_t o;
> +       sector_t max_compressed_sectors;
> +
> +       static const struct dm_arg _args[] = {
> +               {0, 1, "Invalid number of feature args"},
> +       };
> +
> +       as.argc = argc;
> +       as.argv = argv;
> +
> +       u = kzalloc(sizeof(struct dm_update), GFP_KERNEL);
> +       if (!u) {
> +               ti->error = "Cannot allocate dm_update structure";
> +               return -ENOMEM;
> +       }
> +
> +       ti->private = u;
> +       u->ti = ti;
> +
> +       mutex_init(&u->mutex);
> +       bio_list_init(&u->waiting_bios);
> +       INIT_WORK(&u->bg_work, update_background_work);
> +
> +       string = dm_shift_arg(&as);
> +       if (!string)
> +               goto bad_arguments;
> +
> +       r = dm_get_device(ti, string, FMODE_READ | FMODE_WRITE, &u->system_dev);
> +       if (r) {
> +               ti->error = "System device lookup failed";
> +               goto bad;
> +       }
> +
> +       string = dm_shift_arg(&as);
> +       if (!string)
> +               goto bad_arguments;
> +
> +       r = dm_get_device(ti, string, FMODE_READ | FMODE_WRITE, &u->update_dev);
> +       if (r) {
> +               ti->error = "Update device lookup failed";
> +               goto bad;
> +       }
> +
> +       r = dm_read_arg_group(_args, &as, &opt_params, &ti->error);
> +       if (r)
> +               goto bad;
> +
> +       while (opt_params) {
> +               string = dm_shift_arg(&as), opt_params--;
> +               if (!strcasecmp(string, "no_bg_update")) {
> +                       u->no_bg_update = true;
> +               } else {
> +                       r = -EINVAL;
> +                       ti->error = "Invalid optional argument";
> +                       goto bad;
> +               }
> +       }
> +
> +       u->update_lbs = bdev_logical_block_size(u->update_dev->bdev);
> +       u->update_lbs_bits = __ffs(u->update_lbs);
> +
> +       u->decompress_wq = alloc_workqueue("dm-update", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0);
> +       if (!u->decompress_wq) {
> +               ti->error = "Cannot allocate workqueue";
> +               r = -ENOMEM;
> +               goto bad;
> +       }
> +
> +       if (!u->no_bg_update) {
> +               u->bg_wq = alloc_workqueue("dm-update-background", WQ_CPU_INTENSIVE | WQ_UNBOUND, 1);
> +               if (!u->bg_wq) {
> +                       ti->error = "Cannot allocate workqueue";
> +                       r = -ENOMEM;
> +                       goto bad;
> +               }
> +       }
> +
> +       u->dm_io = dm_io_client_create();
> +       if (IS_ERR(u->dm_io)) {
> +               r = PTR_ERR(u->dm_io);
> +               u->dm_io = NULL;
> +               ti->error = "Unable to allocate dm-io client";
> +               goto bad;
> +       }
> +
> +       u->sb = vmalloc(u->update_lbs);
> +       if (!u->sb) {
> +               r = -ENOMEM;
> +               ti->error = "Cannot allocate superblock";
> +               goto bad;
> +       }
> +
> +       r = update_rw(u, false, REQ_OP_READ, 0, u->update_lbs >> SECTOR_SHIFT, u->sb);
> +       if (r) {
> +               ti->error = "Cannot read superblock";
> +               goto bad;
> +       }
> +
> +       if (memcmp(u->sb->magic, UPDATE_MAGIC, 8)) {
> +               r = -EINVAL;
> +               ti->error = "Invalid magic in the superblock";
> +               //printk("%02x %02x %02x %02x %02x %02x %02x %02x\n", u->sb->magic[0], u->sb->magic[1], u->sb->magic[2], u->sb->magic[3], u->sb->magic[4], u->sb->magic[5], u->sb->magic[6], u->sb->magic[7]);
> +               goto bad;
> +       }
> +
> +       if (u->sb->version != UPDATE_VERSION) {
> +               r = -EINVAL;
> +               ti->error = "Invalid version in the superblock";
> +               goto bad;
> +       }
> +
> +       if ((err = validate_sb(u))) {
> +               r = -EINVAL;
> +               ti->error = err;
> +               goto bad;
> +       }
> +
> +       r = dm_set_target_max_io_len(ti, (sector_t)1 << (u->sb->block_bits - SECTOR_SHIFT));
> +       if (r) {
> +               ti->error = "Invalid block size in the superblock";
> +               goto bad;
> +       }
> +
> +       if (!memchr(u->sb->compression, 0, sizeof u->sb->compression)) {
> +               r = -EINVAL;
> +               ti->error = "Invalid compression algorithm in the superblock";
> +               goto bad;
> +       }
> +       if (strcmp(u->sb->compression, "none")) {
> +               u->cc = crypto_alloc_comp(u->sb->compression, 0, 0);
> +               if (!u->cc)
> +                       u->cc = ERR_PTR(-ENOMEM);
> +               if (IS_ERR(u->cc)) {
> +                       r = PTR_ERR(u->cc);
> +                       u->cc = NULL;
> +                       ti->error = "Unsupported compression method";
> +                       goto bad;
> +               }
> +       }
> +
> +       compressed_dir_size = roundup((le64_to_cpu(u->sb->dir_offset) & (u->update_lbs - 1)) + le64_to_cpu(u->sb->dir_compressed_size), u->update_lbs);
> +       if (compressed_dir_size != (size_t)compressed_dir_size) {
> +               r = -EOVERFLOW;
> +               ti->error = "Compressed directory is too large for 32-bit system";
> +               goto bad;
> +       }
> +
> +       compressed_dir = vmalloc(compressed_dir_size);
> +       if (!compressed_dir) {
> +               r = -ENOMEM;
> +               ti->error = "Cannot allocate compressed directory";
> +               goto bad;
> +       }
> +
> +       r = update_rw(u, false, REQ_OP_READ, round_down(le64_to_cpu(u->sb->dir_offset), u->update_lbs) >> SECTOR_SHIFT, compressed_dir_size >> SECTOR_SHIFT, compressed_dir);
> +       if (r) {
> +               ti->error = "Cannot read compressed directory";
> +               goto bad;
> +       }
> +
> +       dst_len = le64_to_cpu(u->sb->dir_n) * sizeof(struct update_entry);
> +       u->entries = vmalloc(dst_len);
> +       if (!u->entries) {
> +               r = -ENOMEM;
> +               ti->error = "Cannot allocate decompressed directory";
> +               goto bad;
> +       }
> +
> +       r = update_decompress(u, compressed_dir + (le64_to_cpu(u->sb->dir_offset) & (u->update_lbs - 1)), le64_to_cpu(u->sb->dir_compressed_size), (void *)u->entries, dst_len);
> +       if (r) {
> +               ti->error = "Cannot decompress directory";
> +               goto bad;
> +       }
> +
> +       if (dst_len != le64_to_cpu(u->sb->dir_n) * sizeof(struct update_entry)) {
> +               r = -EINVAL;
> +               ti->error = "Non-matching length of compressed directory";
> +               goto bad;
> +       }
> +
> +       vfree(compressed_dir);
> +       compressed_dir = NULL;
> +
> +       if ((err = validate_metadata(u))) {
> +               r = -EINVAL;
> +               ti->error = err;
> +               goto bad;
> +       }
> +
> +       o = 0;
> +       max_compressed_sectors = 1;
> +       while (o < le64_to_cpu(u->sb->dir_n) - 1) {
> +               struct update_entry *e = &u->entries[o];
> +               uint64_t src, s;
> +               size_t front_pad, compressed_length;
> +               sector_t sector, n_sectors;
> +               dm_update_get_locations(u, e, &src, &sector, &n_sectors, &front_pad, &compressed_length);
> +               if (n_sectors > max_compressed_sectors)
> +                       max_compressed_sectors = n_sectors;
> +               do {
> +                       o++;
> +                       if (o >= le64_to_cpu(u->sb->dir_n) - 1)
> +                               break;
> +                       e = &u->entries[o];
> +                       s = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +               } while (s == src);
> +       }
> +
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               struct dm_update_buffer *b = &u->buffer[i];
> +               b->decompressed_chunk = vmalloc(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +               if (!b->decompressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +               //memset(b->decompressed_chunk, 0xfe, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +               b->compressed_chunk = vmalloc(max_compressed_sectors << SECTOR_SHIFT);
> +               if (!b->compressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +               //memset(b->compressed_chunk, 0xfd, max_compressed_sectors << SECTOR_SHIFT);
> +               bio_list_init(&b->waiting_bios);
> +               INIT_WORK(&b->work, dm_update_buffer_work);
> +               b->u = u;
> +       }
> +
> +       if (!u->no_bg_update) {
> +               u->bg_decompressed_chunk = vmalloc(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +               if (!u->bg_decompressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +               u->bg_compressed_chunk = vmalloc(max_compressed_sectors << SECTOR_SHIFT);
> +               if (!u->bg_compressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +       }
> +
> +       return 0;
> +
> +bad_arguments:
> +       ti->error = "Not enough arguments";
> +       r = -EINVAL;
> +bad:
> +       if (compressed_dir)
> +               vfree(compressed_dir);
> +       update_dtr(ti);
> +       return r;
> +}
> +
> +static struct target_type update_target = {
> +       .name                   = "update",
> +       .version                = {1, 0, 0},
> +       .module                 = THIS_MODULE,
> +       .ctr                    = update_ctr,
> +       .dtr                    = update_dtr,
> +       .map                    = update_map,
> +       .status                 = update_status,
> +       .iterate_devices        = update_iterate_devices,
> +       .presuspend             = update_presuspend,
> +       .resume                 = update_resume,
> +};
> +
> +static int __init dm_update_init(void)
> +{
> +       int r;
> +
> +       r = dm_register_target(&update_target);
> +       if (r < 0) {
> +               DMERR("register failed %d", r);
> +               return r;
> +       }
> +
> +       return 0;
> +}
> +
> +static void __exit dm_update_exit(void)
> +{
> +       dm_unregister_target(&update_target);
> +}
> +
> +module_init(dm_update_init);
> +module_exit(dm_update_exit);
> +
> +MODULE_DESCRIPTION(DM_NAME " update target");
> +MODULE_AUTHOR("Mikulas Patocka <dm-devel@redhat.com>");
> +MODULE_LICENSE("GPL");
> Index: linux-2.6/drivers/md/dm-update.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6/drivers/md/dm-update.h
> @@ -0,0 +1,23 @@
> +#define UPDATE_MAGIC   "update\0"
> +#define UPDATE_VERSION 0
> +
> +struct update_superblock {
> +       char magic[8];
> +       uint8_t version;
> +       uint8_t block_bits;
> +       uint16_t pad1;
> +       __le32 chunk_blocks;
> +       char compression[16];
> +       __le64 dir_offset;
> +       __le64 dir_compressed_size;
> +       __le64 dir_n;
> +       __le64 pad2;
> +};
> +
> +struct update_entry {
> +       __le32 dest_lo;
> +       __le16 dest_hi;
> +       __le16 src_hi;
> +       __le32 src_lo;
> +       __le32 offset;
> +};
> Index: linux-2.6/Documentation/admin-guide/device-mapper/update.rst
> ===================================================================
> --- /dev/null
> +++ linux-2.6/Documentation/admin-guide/device-mapper/update.rst
> @@ -0,0 +1,54 @@
> +=============
> +Update target
> +=============
> +
> +Embedded devices typically have a read-only partition that stores the operating
> +system image. The purpose of the dm-update target is to transparently update
> +this partition while the system is running.
> +
> +How to use the dm-update target:
> +1. We have files "old.img" and "new.img" that contain the old and new system
> +   image.
> +
> +2. We calculate the difference between these images with the update-diff
> +   utility - it could be downloaded from
> +   http://people.redhat.com/~mpatocka/dm-update/
> +
> +   ./update-diff old.img new.img -o diff.img -a zstd
> +
> +   This command calculates the difference between old.img and new.img,
> +   compresses it with the "zstd" algorithm and stores it in the file diff.img.
> +
> +3. The file diff.img is delivered to the embedded system.
> +
> +4. The embedded system is rebooted.
> +
> +5. On next boot, we load the dm-update target (assume that /dev/sda1 is the
> +   system partition and diff.img is the downloaded difference file):
> +   # losetup /dev/loop0 diff.img
> +   # dmsetup create system --table "0 `blockdev --getsize /dev/sda1` update
> +     /dev/sda1 /dev/loop0 0"
> +
> +6. The update target will present the system image as if the update already
> +   happened - if you read from blocks that are modified, it will transparently
> +   load the data from "diff.img" and decompress it.
> +
> +7. On background, the update target will copy the contents of the file
> +   "diff.img" to the partition /dev/sda1
> +
> +8. When the copying finishes (it can be detected by dmsetup status - if the
> +   first two numbers are equal, the copying finished), you can replace the
> +   update target with a linear target. After replacement, you can delete the
> +   loop device /dev/loop0 and the file diff.img. The partition /dev/sda1 is now
> +   equal to the file "new.img".
> +
> +Constructor parameters:
> +1. the system partition
> +2. the device that contains the difference between old and new image
> +3. the number of optional arguments
> +       no_bg_update
> +               don't perform background update of the system partition
> +
> +Status:
> +1. the number of blocks that were updated so far
> +2. the total number of blocks in the difference file
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Alasdair G Kergon Nov. 23, 2021, 11:15 p.m. UTC | #2
On Tue, Nov 23, 2021 at 02:27:18PM -0800, Akilesh Kailash wrote:
> On Tue, Nov 23, 2021 at 1:03 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> Older devices do not get new kernel features, only LTS updates. On the
> other hand, by having
> it in user-space, we have more control by adding in new features. For
> ex: On Android S, we
> introduced the first version of Android COW format. Now in Android T,
> we plan to support
> new features by having XOR compression (which saves more space).

While you are developing this, sure, you're finding new ways that
make significant space savings and want to roll them out easily
and the userspace approach helps you to do that.

But the law of diminishing returns will eventually kick in, where you
have reached a format that provides "good enough" space savings and then
reducing the runtime performance penalty will become the overriding
concern and that'll point back to an in-kernel solution.

So that's the end point I think we are aiming for here.  Combining the
requirements to find a sweet spot between space saving and performance.
 
By then, the ability still to make a tweak that saves a tiny bit more
space isn't going to be worth paying an ongoing performance penalty
for.  (And there could still be some sort of ebpf-style option.)

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
David Anderson Nov. 23, 2021, 11:27 p.m. UTC | #3
In our ecosystem, the OTA generation and on-device application process
has evolved continually, in every release, for over 10 years now. So
we think it's unlikely that we'll stop making improvements to it. Our
current roadmap has other changes in the pipeline too. It's not just
us trying to eek out diminishing returns. Other parts of the system
change around us, and the OTA system needs to adapt.

The performance penalty is something we've been working on, and have
improved a lot since our first iteration. We're currently
experimenting with io_uring as well.

Best,

-David

On Tue, Nov 23, 2021 at 3:17 PM Alasdair G Kergon <agk@redhat.com> wrote:
>
> On Tue, Nov 23, 2021 at 02:27:18PM -0800, Akilesh Kailash wrote:
> > On Tue, Nov 23, 2021 at 1:03 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > Older devices do not get new kernel features, only LTS updates. On the
> > other hand, by having
> > it in user-space, we have more control by adding in new features. For
> > ex: On Android S, we
> > introduced the first version of Android COW format. Now in Android T,
> > we plan to support
> > new features by having XOR compression (which saves more space).
>
> While you are developing this, sure, you're finding new ways that
> make significant space savings and want to roll them out easily
> and the userspace approach helps you to do that.
>
> But the law of diminishing returns will eventually kick in, where you
> have reached a format that provides "good enough" space savings and then
> reducing the runtime performance penalty will become the overriding
> concern and that'll point back to an in-kernel solution.
>
> So that's the end point I think we are aiming for here.  Combining the
> requirements to find a sweet spot between space saving and performance.
>
> By then, the ability still to make a tweak that saves a tiny bit more
> space isn't going to be worth paying an ongoing performance penalty
> for.  (And there could still be some sort of ebpf-style option.)
>
> Alasdair
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Akilesh Kailash Nov. 24, 2021, 12:07 a.m. UTC | #4
On Tue, Nov 23, 2021 at 1:03 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> I announce the first version of the dm-update target. This target will
> perform background update of read-only system partition on embedded
> systems while the system is running.
>
> It performs similar functionality that is described in this presentation:
> https://linuxplumbersconf.org/event/11/contributions/1049/attachments/826/1562/2021%20LPC_%20dm-snapshot%20in%20user%20space.pdf
> (except that dm-update works entirely in kernel space)
>
> See the file Documentation/admin-guide/device-mapper/update.rst at the end
> of this patch for instructions how to use it.
>
> I'd like to ask Akilesh Kailash and David Anderson if they are interested
> in using this in Android. I am willing to extend the update target, so
> that it will handle Android update format.
>
> Mikulas
>
>
> Index: linux-2.6/drivers/md/Kconfig
> ===================================================================
> --- linux-2.6.orig/drivers/md/Kconfig
> +++ linux-2.6/drivers/md/Kconfig
> @@ -652,4 +652,14 @@ config DM_AUDIT
>           Enables audit logging of several security relevant events in the
>           particular device-mapper targets, especially the integrity target.
>
> +config DM_UPDATE
> +       tristate "Update target support"
> +       depends on BLK_DEV_DM
> +       select DM_BUFIO
> +       select CRYPTO
> +       help
> +         The dm-update target allows transparent updates for embedded devices
> +
> +         If unsure, say N.
> +
>  endif # MD
> Index: linux-2.6/drivers/md/Makefile
> ===================================================================
> --- linux-2.6.orig/drivers/md/Makefile
> +++ linux-2.6/drivers/md/Makefile
> @@ -83,6 +83,7 @@ obj-$(CONFIG_DM_LOG_WRITES)   += dm-log-wr
>  obj-$(CONFIG_DM_INTEGRITY)     += dm-integrity.o
>  obj-$(CONFIG_DM_ZONED)         += dm-zoned.o
>  obj-$(CONFIG_DM_WRITECACHE)    += dm-writecache.o
> +obj-$(CONFIG_DM_UPDATE)                += dm-update.o
>
>  ifeq ($(CONFIG_DM_INIT),y)
>  dm-mod-objs                    += dm-init.o
> Index: linux-2.6/drivers/md/dm-update.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/drivers/md/dm-update.c
> @@ -0,0 +1,751 @@
> +#include <linux/device-mapper.h>
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/dm-io.h>
> +#include <linux/crypto.h>
> +
> +#include "dm-update.h"
> +
> +#define DM_MSG_PREFIX "update"
> +
> +#define N_BUFFERS              16
> +
> +#define B_EMPTY                        0
> +#define B_LOADING              1
> +#define B_VALID                        2
> +
> +struct dm_update_buffer {
> +       int status;
> +       char *compressed_chunk;
> +       char *decompressed_chunk;
> +       uint64_t src;
> +       struct update_entry *e;
> +       struct bio_list waiting_bios;
> +       struct work_struct work;
> +       struct dm_update *u;
> +};
> +
> +struct dm_update {
> +       struct dm_dev *system_dev;
> +       struct dm_dev *update_dev;
> +       struct dm_target *ti;
> +       struct dm_io_client *dm_io;
> +       unsigned update_lbs;
> +       unsigned char update_lbs_bits;
> +       struct update_superblock *sb;
> +       struct crypto_comp *cc;
> +       struct update_entry *entries;
> +
> +       struct mutex mutex;
> +       struct workqueue_struct *decompress_wq;
> +       struct bio_list waiting_bios;
> +
> +       bool no_bg_update;
> +       struct workqueue_struct *bg_wq;
> +       struct work_struct bg_work;
> +       char *bg_compressed_chunk;
> +       char *bg_decompressed_chunk;
> +       size_t bg_index;
> +
> +       unsigned buffer_replacement;
> +       struct dm_update_buffer buffer[N_BUFFERS];
> +};
> +
> +static int update_rw(struct dm_update *u, bool system_dev, int req_op, sector_t sector, sector_t n_sectors, void *ptr)
> +{
> +       struct dm_io_region region;
> +       struct dm_io_request req;
> +
> +       region.bdev = system_dev ? u->system_dev->bdev : u->update_dev->bdev;
> +       region.sector = sector;
> +       region.count = n_sectors;
> +
> +       req.bi_op = req_op;
> +       req.bi_op_flags = REQ_SYNC;
> +       req.mem.type = DM_IO_VMA;
> +       req.mem.ptr.vma = ptr;
> +       req.client = u->dm_io;
> +       req.notify.fn = NULL;
> +       req.notify.context = NULL;
> +
> +       return dm_io(&req, 1, &region, NULL);
> +}
> +
> +static int update_decompress(struct dm_update *u, void *src, size_t src_size, void *dst, size_t dst_size)
> +{
> +       int r;
> +       if (!u->cc) {
> +               if (unlikely(src_size > dst_size))
> +                       return -EOVERFLOW;
> +               memcpy(dst, src, src_size);
> +       } else {
> +               unsigned dst_int;
> +               if (unlikely(src_size != (unsigned)src_size) ||
> +                   unlikely(dst_size != (unsigned)dst_size))
> +                       return -EOVERFLOW;
> +               dst_int = dst_size;
> +               r = crypto_comp_decompress(u->cc, src, src_size, dst, &dst_int);
> +               if (unlikely(r))
> +                       return r;
> +       }
> +       return 0;
> +}
> +
> +static void update_fill_from_buffer(struct dm_update *u, struct dm_update_buffer *b, struct bio *bio)
> +{
> +       struct bio_vec bv;
> +       struct bvec_iter iter;
> +
> +       struct update_entry *e = bio->bi_private;
> +       size_t data_offset = (size_t)le32_to_cpu(e->offset) << u->sb->block_bits;
> +       size_t bio_offset = (bio->bi_iter.bi_sector & ((1 << (u->sb->block_bits - SECTOR_SHIFT)) - 1)) << SECTOR_SHIFT;
> +       const char *data = b->decompressed_chunk + data_offset + bio_offset;
> +
> +       bio_for_each_segment(bv, bio, iter) {
> +               char *addr = kmap_local_page(bv.bv_page);
> +               memcpy(addr + bv.bv_offset, data, bv.bv_len);
> +               flush_dcache_page(bv.bv_page);
> +               kunmap_local(addr);
> +               data += bv.bv_len;
> +       }
> +
> +       bio_endio(bio);
> +}
> +
> +static void update_process_bio(struct dm_update *u, struct bio *bio)
> +{
> +       struct update_entry *e;
> +       struct dm_update_buffer *b;
> +       uint64_t src;
> +       int i;
> +
> +       e = bio->bi_private;
> +
> +       src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               b = &u->buffer[i];
> +               if (b->status == B_EMPTY)
> +                       continue;
> +               if (b->src == src) {
> +                       if (b->status == B_LOADING) {
> +                               bio_list_add(&b->waiting_bios, bio);
> +                       } else {
> +                               update_fill_from_buffer(u, b, bio);
> +                       }
> +                       return;
> +               }
> +       }
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               b = &u->buffer[i];
> +               if (b->status == B_EMPTY) {
> +replace_buffer:
> +                       bio_list_add(&b->waiting_bios, bio);
> +                       b->status = B_LOADING;
> +                       b->src = src;
> +                       b->e = e;
> +                       queue_work(u->decompress_wq, &b->work);
> +                       return;
> +               }
> +       }
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               b = &u->buffer[u->buffer_replacement];
> +               u->buffer_replacement = (u->buffer_replacement + 1) % N_BUFFERS;
> +               if (b->status == B_VALID)
> +                       goto replace_buffer;
> +       }
> +       bio_list_add(&u->waiting_bios, bio);
> +}
> +
> +static void dm_update_get_locations(struct dm_update *u, struct update_entry *e, uint64_t *src, sector_t *sector, sector_t *n_sectors, size_t *front_pad, size_t *compressed_length)
> +{
> +       uint64_t next_src;
> +       *src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +       do {
> +               e++;
> +               next_src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +       } while (next_src == *src);
> +
> +       *compressed_length = next_src - *src;
> +       *front_pad = *src & (u->update_lbs - 1);
> +       *sector = *src >> u->update_lbs_bits << (u->update_lbs_bits - SECTOR_SHIFT);
> +       *n_sectors = round_up(*front_pad + *compressed_length, u->update_lbs) >> SECTOR_SHIFT;
> +}
> +
> +static void dm_update_buffer_work(struct work_struct *w)
> +{
> +       struct dm_update_buffer *b = container_of(w, struct dm_update_buffer, work);
> +       struct dm_update *u = b->u;
> +       uint64_t src;
> +       size_t front_pad, compressed_length;
> +       sector_t sector, n_sectors;
> +       struct bio *bio, *waiting_bios;
> +       int r;
> +
> +       dm_update_get_locations(u, b->e, &src, &sector, &n_sectors, &front_pad, &compressed_length);
> +
> +       r = update_rw(u, false, REQ_OP_READ, sector, n_sectors, b->compressed_chunk);
> +       if (unlikely(r))
> +               goto io_error;
> +
> +       r = update_decompress(u, b->compressed_chunk + front_pad, compressed_length, b->decompressed_chunk, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +       if (unlikely(r))
> +               goto io_error;
> +
> +io_error:
> +       mutex_lock(&u->mutex);
> +       b->status = likely(!r) ? B_VALID : B_EMPTY;
> +       while ((bio = bio_list_pop(&b->waiting_bios))) {
> +               if (unlikely(r)) {
> +                       bio->bi_status = errno_to_blk_status(r);
> +                       bio_endio(bio);
> +               } else {
> +                       update_fill_from_buffer(u, b, bio);
> +               }
> +       }
> +
> +       waiting_bios = bio_list_get(&u->waiting_bios);
> +       while (waiting_bios != NULL) {
> +               bio = waiting_bios;
> +               waiting_bios = bio->bi_next;
> +               bio->bi_next = NULL;
> +               update_process_bio(u, bio);
> +       }
> +
> +       mutex_unlock(&u->mutex);
> +}
> +
> +static int update_map(struct dm_target *ti, struct bio *bio)
> +{
> +       struct dm_update *u = ti->private;
> +       sector_t block;
> +       size_t first, last, half;
> +       struct update_entry *e;
> +
> +       if (bio_data_dir(bio) == WRITE)
> +               return DM_MAPIO_KILL;
> +
> +       block = bio->bi_iter.bi_sector >> (u->sb->block_bits - SECTOR_SHIFT);
> +
> +       first = 0;
> +       last = le64_to_cpu(u->sb->dir_n) - 1;
> +       while (first < last) {
> +               sector_t test_block;
> +               half = first / 2 + last / 2 + (first & last & 1);
> +               e = &u->entries[half];
> +               test_block = le32_to_cpu(e->dest_lo) + ((uint64_t)le16_to_cpu(e->dest_hi) << 32);
> +               if (test_block == block)
> +                       goto found;
> +               if (test_block < block) {
> +                       first = half + 1;
> +               } else {
> +                       last = half;
> +               }
> +       }
> +
> +       bio_set_dev(bio, u->system_dev->bdev);
> +       return DM_MAPIO_REMAPPED;
> +
> +found:
> +       bio->bi_private = e;
> +
> +       mutex_lock(&u->mutex);
> +       update_process_bio(u, bio);
> +       mutex_unlock(&u->mutex);
> +
> +       return DM_MAPIO_SUBMITTED;
> +}
> +
> +static void update_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen)
> +{
> +       struct dm_update *u = ti->private;
> +       unsigned extra_args;
> +       unsigned sz = 0;
> +
> +       switch (type) {
> +               case STATUSTYPE_INFO:
> +                       DMEMIT("%zu %llu", READ_ONCE(u->bg_index), (unsigned long long)(le64_to_cpu(u->sb->dir_n) - 1));
> +                       break;
> +               case STATUSTYPE_TABLE:
> +                       DMEMIT("%s %s ", u->system_dev->name, u->update_dev->name);
> +
> +                       extra_args = 0;
> +                       if (u->no_bg_update)
> +                               extra_args++;
> +
> +                       DMEMIT("%u", extra_args);
> +                       if (u->no_bg_update)
> +                               DMEMIT(" no_bg_update");
> +                       break;
> +               case STATUSTYPE_IMA:
> +                       DMEMIT_TARGET_NAME_VERSION(ti->type);
> +                       DMEMIT(",update_system_device=%s", u->system_dev->name);
> +                       DMEMIT(",update_update_device=%s", u->update_dev->name);
> +                       DMEMIT(";");
> +                       break;
> +       }
> +}
> +
> +static int update_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data)
> +{
> +       struct dm_update *u = ti->private;
> +
> +       return fn(ti, u->system_dev, 0, ti->len, data);
> +}
> +
> +static void update_background_work(struct work_struct *w)
> +{
> +       struct dm_update *u = container_of(w, struct dm_update, bg_work);
> +       uint64_t src;
> +       size_t front_pad, compressed_length;
> +       sector_t sector, n_sectors;
> +       int r;
> +
> +       if (u->bg_index >= le64_to_cpu(u->sb->dir_n) - 1)
> +               return;
> +
> +       dm_update_get_locations(u, &u->entries[u->bg_index], &src, &sector, &n_sectors, &front_pad, &compressed_length);
> +
> +       r = update_rw(u, false, REQ_OP_READ, sector, n_sectors, u->bg_compressed_chunk);
> +       if (unlikely(r)) {
> +               DMERR("error reading update device (%d), aborting backgroup update", r);
> +               return;
> +       }
> +
> +       r = update_decompress(u, u->bg_compressed_chunk + front_pad, compressed_length, u->bg_decompressed_chunk, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +       if (unlikely(r)) {
> +               DMERR("error decompressing update data (%d), aborting backgroup update", r);
> +               return;
> +       }
> +
> +       while (u->bg_index < le64_to_cpu(u->sb->dir_n) - 1) {
> +               uint64_t s, dest;
> +               size_t offset;
> +               struct update_entry *e = &u->entries[u->bg_index];
> +               s = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +               if (s != src)
> +                       break;
> +
> +               dest = le32_to_cpu(e->dest_lo) + ((uint64_t)le16_to_cpu(e->dest_hi) << 32);
> +               offset = (size_t)le32_to_cpu(e->offset) << u->sb->block_bits;
> +
> +               r = update_rw(u, true, REQ_OP_WRITE, dest << (u->sb->block_bits - SECTOR_SHIFT), 1UL << (u->sb->block_bits - SECTOR_SHIFT), u->bg_decompressed_chunk + offset);
> +               if (unlikely(r)) {
> +                       DMERR("error writing to the system device (%d), aborting backgroup update", r);
> +                       return;
> +               }
> +
> +               if (unlikely(u->bg_index == le64_to_cpu(u->sb->dir_n) - 2)) {
> +                       r = update_rw(u, true, REQ_OP_WRITE | REQ_PREFLUSH, 0, 0, NULL);
> +                       if (unlikely(r)) {
> +                               DMERR("error synchronizing the system device (%d), aborting backgroup update", r);
> +                               return;
> +                       }
> +               }
> +
> +               WRITE_ONCE(u->bg_index, u->bg_index + 1);
> +       }
> +
> +       queue_work(u->bg_wq, &u->bg_work);
> +}
> +
> +static void update_presuspend(struct dm_target *ti)
> +{
> +       struct dm_update *u = ti->private;
> +       cancel_work_sync(&u->bg_work);
> +}
> +
> +static void update_resume(struct dm_target *ti)
> +{
> +       struct dm_update *u = ti->private;
> +       if (u->no_bg_update)
> +               return;
> +       queue_work(u->bg_wq, &u->bg_work);
> +}
> +
> +static void update_dtr(struct dm_target *ti)
> +{
> +       struct dm_update *u = ti->private;
> +       int i;
> +
> +       if (u->bg_wq)
> +               destroy_workqueue(u->bg_wq);
> +       if (u->decompress_wq)
> +               destroy_workqueue(u->decompress_wq);
> +
> +       vfree(u->bg_compressed_chunk);
> +       vfree(u->bg_decompressed_chunk);
> +
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               vfree(u->buffer[i].compressed_chunk);
> +               vfree(u->buffer[i].decompressed_chunk);
> +       }
> +       vfree(u->sb);
> +       vfree(u->entries);
> +       if (u->dm_io)
> +               dm_io_client_destroy(u->dm_io);
> +       if (u->system_dev)
> +               dm_put_device(ti, u->system_dev);
> +       if (u->update_dev)
> +               dm_put_device(ti, u->update_dev);
> +       if (u->cc)
> +               crypto_free_comp(u->cc);
> +
> +       mutex_init(&u->mutex);
> +
> +       kfree(u);
> +}
> +
> +static char *validate_sb(struct dm_update *u)
> +{
> +       struct update_superblock *sb = u->sb;
> +
> +       if (sb->block_bits < SECTOR_SHIFT || sb->block_bits >= 32)
> +               return "Invalid superblock: block_bits";
> +
> +       if (!le32_to_cpu(sb->chunk_blocks))
> +               return "Invalid superblock: chunk_blocks is zero";
> +
> +       if (le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits >> u->sb->block_bits != le32_to_cpu(u->sb->chunk_blocks))
> +               return "Invalid superblock: too large chunk_blocks";
> +
> +       if ((int)(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits) < 0)
> +               return "Invalid superblock: too large chunk_blocks";
> +
> +       if (le64_to_cpu(sb->dir_n) < 1)
> +               return "Invalid superblock: zero dir_n";
> +
> +       if ((size_t)le64_to_cpu(sb->dir_n) * sizeof(struct update_entry) / sizeof(struct update_entry) != le64_to_cpu(sb->dir_n))
> +               return "Invalid superblock: overflow in dir_n";
> +
> +       return NULL;
> +}
> +
> +static char *validate_metadata(struct dm_update *u)
> +{
> +       struct update_superblock *sb = u->sb;
> +       size_t i;
> +       size_t n = le64_to_cpu(sb->dir_n) - 1;
> +
> +       for (i = 0; i < n; i++) {
> +               struct update_entry *e1 = &u->entries[i];
> +               struct update_entry *e2 = &u->entries[i + 1];
> +               uint64_t dest1, dest2;
> +
> +               if (le32_to_cpu(e1->offset) >= le32_to_cpu(sb->chunk_blocks))
> +                       return "Invalid metadata: offset is too high";
> +
> +               dest1 = le32_to_cpu(e1->dest_lo) + ((uint64_t)le16_to_cpu(e1->dest_hi) << 32);
> +               dest2 = le32_to_cpu(e2->dest_lo) + ((uint64_t)le16_to_cpu(e2->dest_hi) << 32);
> +
> +               if (dest1 >= dest2)
> +                       return "Invalid metadata: destination is not monotonic";
> +       }
> +
> +       return NULL;
> +}
> +
> +static int update_ctr(struct dm_target *ti, unsigned argc, char **argv)
> +{
> +       struct dm_update *u;
> +       struct dm_arg_set as;
> +       const char *string;
> +       unsigned opt_params;
> +       char *err;
> +       int r;
> +       uint64_t compressed_dir_size;
> +       void *compressed_dir = NULL;
> +       size_t dst_len;
> +       int i;
> +       size_t o;
> +       sector_t max_compressed_sectors;
> +
> +       static const struct dm_arg _args[] = {
> +               {0, 1, "Invalid number of feature args"},
> +       };
> +
> +       as.argc = argc;
> +       as.argv = argv;
> +
> +       u = kzalloc(sizeof(struct dm_update), GFP_KERNEL);
> +       if (!u) {
> +               ti->error = "Cannot allocate dm_update structure";
> +               return -ENOMEM;
> +       }
> +
> +       ti->private = u;
> +       u->ti = ti;
> +
> +       mutex_init(&u->mutex);
> +       bio_list_init(&u->waiting_bios);
> +       INIT_WORK(&u->bg_work, update_background_work);
> +
> +       string = dm_shift_arg(&as);
> +       if (!string)
> +               goto bad_arguments;
> +
> +       r = dm_get_device(ti, string, FMODE_READ | FMODE_WRITE, &u->system_dev);
> +       if (r) {
> +               ti->error = "System device lookup failed";
> +               goto bad;
> +       }
> +
> +       string = dm_shift_arg(&as);
> +       if (!string)
> +               goto bad_arguments;
> +
> +       r = dm_get_device(ti, string, FMODE_READ | FMODE_WRITE, &u->update_dev);
> +       if (r) {
> +               ti->error = "Update device lookup failed";
> +               goto bad;
> +       }
> +
> +       r = dm_read_arg_group(_args, &as, &opt_params, &ti->error);
> +       if (r)
> +               goto bad;
> +
> +       while (opt_params) {
> +               string = dm_shift_arg(&as), opt_params--;
> +               if (!strcasecmp(string, "no_bg_update")) {
> +                       u->no_bg_update = true;
> +               } else {
> +                       r = -EINVAL;
> +                       ti->error = "Invalid optional argument";
> +                       goto bad;
> +               }
> +       }
> +
> +       u->update_lbs = bdev_logical_block_size(u->update_dev->bdev);
> +       u->update_lbs_bits = __ffs(u->update_lbs);
> +
> +       u->decompress_wq = alloc_workqueue("dm-update", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0);
> +       if (!u->decompress_wq) {
> +               ti->error = "Cannot allocate workqueue";
> +               r = -ENOMEM;
> +               goto bad;
> +       }
> +
> +       if (!u->no_bg_update) {
> +               u->bg_wq = alloc_workqueue("dm-update-background", WQ_CPU_INTENSIVE | WQ_UNBOUND, 1);
> +               if (!u->bg_wq) {
> +                       ti->error = "Cannot allocate workqueue";
> +                       r = -ENOMEM;
> +                       goto bad;
> +               }
> +       }
> +
> +       u->dm_io = dm_io_client_create();
> +       if (IS_ERR(u->dm_io)) {
> +               r = PTR_ERR(u->dm_io);
> +               u->dm_io = NULL;
> +               ti->error = "Unable to allocate dm-io client";
> +               goto bad;
> +       }
> +
> +       u->sb = vmalloc(u->update_lbs);
> +       if (!u->sb) {
> +               r = -ENOMEM;
> +               ti->error = "Cannot allocate superblock";
> +               goto bad;
> +       }
> +
> +       r = update_rw(u, false, REQ_OP_READ, 0, u->update_lbs >> SECTOR_SHIFT, u->sb);
> +       if (r) {
> +               ti->error = "Cannot read superblock";
> +               goto bad;
> +       }
> +
> +       if (memcmp(u->sb->magic, UPDATE_MAGIC, 8)) {
> +               r = -EINVAL;
> +               ti->error = "Invalid magic in the superblock";
> +               //printk("%02x %02x %02x %02x %02x %02x %02x %02x\n", u->sb->magic[0], u->sb->magic[1], u->sb->magic[2], u->sb->magic[3], u->sb->magic[4], u->sb->magic[5], u->sb->magic[6], u->sb->magic[7]);
> +               goto bad;
> +       }
> +
> +       if (u->sb->version != UPDATE_VERSION) {
> +               r = -EINVAL;
> +               ti->error = "Invalid version in the superblock";
> +               goto bad;
> +       }
> +
> +       if ((err = validate_sb(u))) {
> +               r = -EINVAL;
> +               ti->error = err;
> +               goto bad;
> +       }
> +
> +       r = dm_set_target_max_io_len(ti, (sector_t)1 << (u->sb->block_bits - SECTOR_SHIFT));
> +       if (r) {
> +               ti->error = "Invalid block size in the superblock";
> +               goto bad;
> +       }
> +
> +       if (!memchr(u->sb->compression, 0, sizeof u->sb->compression)) {
> +               r = -EINVAL;
> +               ti->error = "Invalid compression algorithm in the superblock";
> +               goto bad;
> +       }
> +       if (strcmp(u->sb->compression, "none")) {
> +               u->cc = crypto_alloc_comp(u->sb->compression, 0, 0);
> +               if (!u->cc)
> +                       u->cc = ERR_PTR(-ENOMEM);
> +               if (IS_ERR(u->cc)) {
> +                       r = PTR_ERR(u->cc);
> +                       u->cc = NULL;
> +                       ti->error = "Unsupported compression method";
> +                       goto bad;
> +               }
> +       }
> +
> +       compressed_dir_size = roundup((le64_to_cpu(u->sb->dir_offset) & (u->update_lbs - 1)) + le64_to_cpu(u->sb->dir_compressed_size), u->update_lbs);
> +       if (compressed_dir_size != (size_t)compressed_dir_size) {
> +               r = -EOVERFLOW;
> +               ti->error = "Compressed directory is too large for 32-bit system";
> +               goto bad;
> +       }
> +
> +       compressed_dir = vmalloc(compressed_dir_size);
> +       if (!compressed_dir) {
> +               r = -ENOMEM;
> +               ti->error = "Cannot allocate compressed directory";
> +               goto bad;
> +       }
> +
> +       r = update_rw(u, false, REQ_OP_READ, round_down(le64_to_cpu(u->sb->dir_offset), u->update_lbs) >> SECTOR_SHIFT, compressed_dir_size >> SECTOR_SHIFT, compressed_dir);
> +       if (r) {
> +               ti->error = "Cannot read compressed directory";
> +               goto bad;
> +       }
> +
> +       dst_len = le64_to_cpu(u->sb->dir_n) * sizeof(struct update_entry);
> +       u->entries = vmalloc(dst_len);
> +       if (!u->entries) {
> +               r = -ENOMEM;
> +               ti->error = "Cannot allocate decompressed directory";
> +               goto bad;
> +       }
> +
> +       r = update_decompress(u, compressed_dir + (le64_to_cpu(u->sb->dir_offset) & (u->update_lbs - 1)), le64_to_cpu(u->sb->dir_compressed_size), (void *)u->entries, dst_len);
> +       if (r) {
> +               ti->error = "Cannot decompress directory";
> +               goto bad;
> +       }
> +
> +       if (dst_len != le64_to_cpu(u->sb->dir_n) * sizeof(struct update_entry)) {
> +               r = -EINVAL;
> +               ti->error = "Non-matching length of compressed directory";
> +               goto bad;
> +       }
> +
> +       vfree(compressed_dir);
> +       compressed_dir = NULL;
> +
> +       if ((err = validate_metadata(u))) {
> +               r = -EINVAL;
> +               ti->error = err;
> +               goto bad;
> +       }
> +
> +       o = 0;
> +       max_compressed_sectors = 1;
> +       while (o < le64_to_cpu(u->sb->dir_n) - 1) {
> +               struct update_entry *e = &u->entries[o];
> +               uint64_t src, s;
> +               size_t front_pad, compressed_length;
> +               sector_t sector, n_sectors;
> +               dm_update_get_locations(u, e, &src, &sector, &n_sectors, &front_pad, &compressed_length);
> +               if (n_sectors > max_compressed_sectors)
> +                       max_compressed_sectors = n_sectors;
> +               do {
> +                       o++;
> +                       if (o >= le64_to_cpu(u->sb->dir_n) - 1)
> +                               break;
> +                       e = &u->entries[o];
> +                       s = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
> +               } while (s == src);
> +       }
> +
> +       for (i = 0; i < N_BUFFERS; i++) {
> +               struct dm_update_buffer *b = &u->buffer[i];
> +               b->decompressed_chunk = vmalloc(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +               if (!b->decompressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +               //memset(b->decompressed_chunk, 0xfe, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +               b->compressed_chunk = vmalloc(max_compressed_sectors << SECTOR_SHIFT);
> +               if (!b->compressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +               //memset(b->compressed_chunk, 0xfd, max_compressed_sectors << SECTOR_SHIFT);
> +               bio_list_init(&b->waiting_bios);
> +               INIT_WORK(&b->work, dm_update_buffer_work);
> +               b->u = u;
> +       }
> +
> +       if (!u->no_bg_update) {
> +               u->bg_decompressed_chunk = vmalloc(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
> +               if (!u->bg_decompressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +               u->bg_compressed_chunk = vmalloc(max_compressed_sectors << SECTOR_SHIFT);
> +               if (!u->bg_compressed_chunk) {
> +                       r = -ENOMEM;
> +                       ti->error = "Cannot allocate buffers";
> +                       goto bad;
> +               }
> +       }
> +
> +       return 0;
> +
> +bad_arguments:
> +       ti->error = "Not enough arguments";
> +       r = -EINVAL;
> +bad:
> +       if (compressed_dir)
> +               vfree(compressed_dir);
> +       update_dtr(ti);
> +       return r;
> +}
> +
> +static struct target_type update_target = {
> +       .name                   = "update",
> +       .version                = {1, 0, 0},
> +       .module                 = THIS_MODULE,
> +       .ctr                    = update_ctr,
> +       .dtr                    = update_dtr,
> +       .map                    = update_map,
> +       .status                 = update_status,
> +       .iterate_devices        = update_iterate_devices,
> +       .presuspend             = update_presuspend,
> +       .resume                 = update_resume,
> +};
> +
> +static int __init dm_update_init(void)
> +{
> +       int r;
> +
> +       r = dm_register_target(&update_target);
> +       if (r < 0) {
> +               DMERR("register failed %d", r);
> +               return r;
> +       }
> +
> +       return 0;
> +}
> +
> +static void __exit dm_update_exit(void)
> +{
> +       dm_unregister_target(&update_target);
> +}
> +
> +module_init(dm_update_init);
> +module_exit(dm_update_exit);
> +
> +MODULE_DESCRIPTION(DM_NAME " update target");
> +MODULE_AUTHOR("Mikulas Patocka <dm-devel@redhat.com>");
> +MODULE_LICENSE("GPL");
> Index: linux-2.6/drivers/md/dm-update.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6/drivers/md/dm-update.h
> @@ -0,0 +1,23 @@
> +#define UPDATE_MAGIC   "update\0"
> +#define UPDATE_VERSION 0
> +
> +struct update_superblock {
> +       char magic[8];
> +       uint8_t version;
> +       uint8_t block_bits;
> +       uint16_t pad1;
> +       __le32 chunk_blocks;
> +       char compression[16];
> +       __le64 dir_offset;
> +       __le64 dir_compressed_size;
> +       __le64 dir_n;
> +       __le64 pad2;
> +};
> +
> +struct update_entry {
> +       __le32 dest_lo;
> +       __le16 dest_hi;
> +       __le16 src_hi;
> +       __le32 src_lo;
> +       __le32 offset;
> +};
> Index: linux-2.6/Documentation/admin-guide/device-mapper/update.rst
> ===================================================================
> --- /dev/null
> +++ linux-2.6/Documentation/admin-guide/device-mapper/update.rst
> @@ -0,0 +1,54 @@
> +=============
> +Update target
> +=============
> +
> +Embedded devices typically have a read-only partition that stores the operating
> +system image. The purpose of the dm-update target is to transparently update
> +this partition while the system is running.
> +
> +How to use the dm-update target:
> +1. We have files "old.img" and "new.img" that contain the old and new system
> +   image.
> +
> +2. We calculate the difference between these images with the update-diff
> +   utility - it could be downloaded from
> +   http://people.redhat.com/~mpatocka/dm-update/
> +
> +   ./update-diff old.img new.img -o diff.img -a zstd
> +
> +   This command calculates the difference between old.img and new.img,
> +   compresses it with the "zstd" algorithm and stores it in the file diff.img.

Why not extend dm-snapshot; in that way you can have the existing
kernel COW format + support compression.
I understand this for read-only but if dm-snapshot supports it,
existing users  can have the
compression feature if required.

Also, I am curious what are the other real world use case here apart
from Android ?

> +3. The file diff.img is delivered to the embedded system.
> +
> +4. The embedded system is rebooted.
> +
> +5. On next boot, we load the dm-update target (assume that /dev/sda1 is the
> +   system partition and diff.img is the downloaded difference file):
> +   # losetup /dev/loop0 diff.img
> +   # dmsetup create system --table "0 `blockdev --getsize /dev/sda1` update
> +     /dev/sda1 /dev/loop0 0"
> +
> +6. The update target will present the system image as if the update already
> +   happened - if you read from blocks that are modified, it will transparently
> +   load the data from "diff.img" and decompress it.
> +
> +7. On background, the update target will copy the contents of the file
> +   "diff.img" to the partition /dev/sda1
> +
> +8. When the copying finishes (it can be detected by dmsetup status - if the
> +   first two numbers are equal, the copying finished), you can replace the
> +   update target with a linear target. After replacement, you can delete the
> +   loop device /dev/loop0 and the file diff.img. The partition /dev/sda1 is now
> +   equal to the file "new.img".
> +
> +Constructor parameters:
> +1. the system partition
> +2. the device that contains the difference between old and new image
> +3. the number of optional arguments
> +       no_bg_update
> +               don't perform background update of the system partition
> +
> +Status:
> +1. the number of blocks that were updated so far
> +2. the total number of blocks in the difference file
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Alasdair G Kergon Nov. 24, 2021, 12:38 a.m. UTC | #5
On Tue, Nov 23, 2021 at 04:07:21PM -0800, Akilesh Kailash wrote:
> Why not extend dm-snapshot; in that way you can have the existing
> kernel COW format + support compression.

In short, this is quite a focussed requirement and it doesn't seem worth
compromising on the goals by attempting to shoe-horn it into a framework
that was designed to fit around some quite different constraints twenty
years ago.

> I understand this for read-only but if dm-snapshot supports it,
> existing users  can have the
> compression feature if required.

I doubt that anyone wants to develop dm-snapshot any further.  It's only
really good for small short-term snapshots where performance and memory
usage don't matter.  The more-sophisticated thin provisioning approach
replaced it.

> Also, I am curious what are the other real world use case here apart
> from Android ?

There seems to be a general industry movement towards edge, attestation,
trusted boot, nested systems, confidential computing, containers,
<insert your favourite buzzwords> etc. and I think this type of
device-mapper target might play a useful role as one of the low-level
components involved in building up support for some of those scenarios.
(Just as we recently began adding support for attestation.)
 
Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Alasdair G Kergon Nov. 24, 2021, 1:02 a.m. UTC | #6
On Wed, Nov 24, 2021 at 12:38:31AM +0000, Alasdair G Kergon wrote:
> There seems to be a general industry movement towards edge, attestation,
> trusted boot, nested systems, confidential computing, containers,
> <insert your favourite buzzwords> etc. and I think this type of
> device-mapper target might play a useful role as one of the low-level
> components involved in building up support for some of those scenarios.
> (Just as we recently began adding support for attestation.)

For example, I think we've already made good progress towards
standardising the industry around dm-verity.  (At least, I keep
on encountering it being used in various different places, rather
than people attempting to develop their own version.)  We extended
that concept with dm-integrity but I'm less sure about how widely
that has been taken up so far.

For edge systems needing unattended remote reliable updates, our
high-level challenge here is: Can we together develop a decent and
efficient solution for OTA updates that the industry will be happy to
standardise around?

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Akilesh Kailash Nov. 24, 2021, 7:24 a.m. UTC | #7
On Tue, Nov 23, 2021 at 5:04 PM Alasdair G Kergon <agk@redhat.com> wrote:
>
> On Wed, Nov 24, 2021 at 12:38:31AM +0000, Alasdair G Kergon wrote:
> > There seems to be a general industry movement towards edge, attestation,
> > trusted boot, nested systems, confidential computing, containers,
> > <insert your favourite buzzwords> etc. and I think this type of
> > device-mapper target might play a useful role as one of the low-level
> > components involved in building up support for some of those scenarios.
> > (Just as we recently began adding support for attestation.)
>
> For example, I think we've already made good progress towards
> standardising the industry around dm-verity.  (At least, I keep
> on encountering it being used in various different places, rather
> than people attempting to develop their own version.)  We extended
> that concept with dm-integrity but I'm less sure about how widely
> that has been taken up so far.
>
> For edge systems needing unattended remote reliable updates, our
> high-level challenge here is: Can we together develop a decent and
> efficient solution for OTA updates that the industry will be happy to
> standardise around?

With OTA, it is challenging to have one general COW format - for instance
what is good for the Android ecosystem may not be useful for the
enterprise world.
For ex: Most of the space savings in Android comes from COPY operation i.e
for an incremental OTA, we would have metadata which states:

COPY BLOCK X -> BLOCK Y

There is no compression involved with these operations. Compression is only
when a block is "replaced". All these are too specific to the Android ecosystem.

I don't want to hijack this conversation here as the current patch is more
specific; however, since you mentioned about the standardization of OTA updates,
I want to bring up the aspect of block device in user space which gives you the
flexibility for users to define COW format as required. There are already couple
of use cases which we are aware off (one being android OTA) and there are
few open source work such as [1] .

[1]  https://github.com/albertofaria/bdus

> Alasdair
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka Nov. 24, 2021, 4:10 p.m. UTC | #8
On Tue, 23 Nov 2021, David Anderson wrote:

> In our ecosystem, the OTA generation and on-device application process
> has evolved continually, in every release, for over 10 years now. So
> we think it's unlikely that we'll stop making improvements to it. Our
> current roadmap has other changes in the pipeline too. It's not just
> us trying to eek out diminishing returns. Other parts of the system
> change around us, and the OTA system needs to adapt.
> 
> The performance penalty is something we've been working on, and have
> improved a lot since our first iteration. We're currently
> experimenting with io_uring as well.
> 
> Best,
> 
> -David

Hi

I understand that an update format developed over 10 years will have 
better compression ratio than my format developed in 2 months.

I'd be interested in extending dm-update to handle your your update format 
and possibly add some abstraction, so that it can work with multiple 
formats.

You say that you have "COPY" and "XOR" operations.

How do you search for blocks that are similar, so that the "XOR" method is 
benefical for them? How do you make sure that you don't perform the "XOR" 
operation twice, if there's system crash when performing it?

Could it be possible for you to give us two Android images and a program 
that calculates difference between them? So that we could see how well we 
are performing compared to the existing solution.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Akilesh Kailash Nov. 24, 2021, 11:44 p.m. UTC | #9
On Wed, Nov 24, 2021 at 8:10 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
>
> On Tue, 23 Nov 2021, David Anderson wrote:
>
> > In our ecosystem, the OTA generation and on-device application process
> > has evolved continually, in every release, for over 10 years now. So
> > we think it's unlikely that we'll stop making improvements to it. Our
> > current roadmap has other changes in the pipeline too. It's not just
> > us trying to eek out diminishing returns. Other parts of the system
> > change around us, and the OTA system needs to adapt.
> >
> > The performance penalty is something we've been working on, and have
> > improved a lot since our first iteration. We're currently
> > experimenting with io_uring as well.
> >
> > Best,
> >
> > -David
>
> Hi
>
> I understand that an update format developed over 10 years will have
> better compression ratio than my format developed in 2 months.
>
> I'd be interested in extending dm-update to handle your your update format
> and possibly add some abstraction, so that it can work with multiple
> formats.
>
> You say that you have "COPY" and "XOR" operations.
>
> How do you search for blocks that are similar, so that the "XOR" method is
> benefical for them? How do you make sure that you don't perform the "XOR"
> operation twice, if there's system crash when performing it?

CC: Kelvin and Tianjie who are familiar with the  OTA generation.

> Could it be possible for you to give us two Android images and a program
> that calculates difference between them? So that we could see how well we
> are performing compared to the existing solution.
>
> Mikulas
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
zhangkelvin Nov. 25, 2021, 1:19 a.m. UTC | #10
There are two different formats in Android's OTA infrastructure:

1. The OTA format. This is what we generate from two android images, and
this is what Android devices download when installing an update.
    This format is designed to be as small as possible. Therefore it is not
suitable to serve as an on disk COW format, as disk read performance would
be horrible.
2. The COW format. After the device downloads the OTA package, it will
translate OTA format into COW format. The COW format has larger disk space
footprint,
    but much better disk read performance.

Unfortunately you can't get example COW images easily. As we don't [yet]
have tools to create them on host. The only way to get a COW image is to
apply OTA on device and read data from the android device.

Below are some instructions on how to produce android OTA packages(format
#1)

Android's OTA generator is open source at
https://cs.android.com/android/platform/superproject/+/master:system/update_engine/payload_generator/generate_delta_main.cc?q=generate_delta_main

We have precompiled OTA generator binaries available at
https://ci.android.com/ . To download:
1. Look for column "aosp_cf_x86_64_phone"
2. Click on any green squares
3. Click on "artifacts" tab e.g.
https://ci.android.com/builds/submitted/7945102/aosp_cf_x86_64_phone-userdebug/latest
4. Look for otatools.zip , download it

The same https://ci.android.com/ website also has android images available
for download. To fetch an android image:

1. Look for column "aosp_cf_x86_64_phone"
2. Click on any green squares
3. Click on "artifacts" tab
4. Look for "target_files" e.g.
aosp_cf_x86_64_phone-target_files-7945102.zip . Note, this name will be
different for each build.

To generate an incremental OTA package:

1. Extract otatools.zip, add bin/ subdir to your PATH, add lib/lib64 subdir
to LD_LIBRARY_PATH
2. ota_from_target_files -v -i source_build_target_files.zip
target_build_target_files.zip ota.zip


> How do you search for blocks that are similar, so that the "XOR" method is
> benefical for them?

In short, we use BSDIFF. The core algorithm is explained in
https://www.daemonology.net/papers/bsdiff.pdf . It uses suffix array + some
heuristics to identify similar blocks.

> How do you make sure that you don't perform the "XOR"
> operation twice, if there's system crash when performing it?

That's handled by snapuserd. Akilesh can show you all the details. But
basically:

1. We compute an order which COW operation should be applied, and this
order is guaranteed to be conflict free(an operation earlier in the order
will never overwrite blocks needed by a later operation).
2. The snapuserd has to write checkpoint data to disk so it can properly
resume.

On Wed, Nov 24, 2021 at 3:44 PM Akilesh Kailash <akailash@google.com> wrote:

> On Wed, Nov 24, 2021 at 8:10 AM Mikulas Patocka <mpatocka@redhat.com>
> wrote:
> >
> >
> >
> > On Tue, 23 Nov 2021, David Anderson wrote:
> >
> > > In our ecosystem, the OTA generation and on-device application process
> > > has evolved continually, in every release, for over 10 years now. So
> > > we think it's unlikely that we'll stop making improvements to it. Our
> > > current roadmap has other changes in the pipeline too. It's not just
> > > us trying to eek out diminishing returns. Other parts of the system
> > > change around us, and the OTA system needs to adapt.
> > >
> > > The performance penalty is something we've been working on, and have
> > > improved a lot since our first iteration. We're currently
> > > experimenting with io_uring as well.
> > >
> > > Best,
> > >
> > > -David
> >
> > Hi
> >
> > I understand that an update format developed over 10 years will have
> > better compression ratio than my format developed in 2 months.
> >
> > I'd be interested in extending dm-update to handle your your update
> format
> > and possibly add some abstraction, so that it can work with multiple
> > formats.
> >
> > You say that you have "COPY" and "XOR" operations.
> >
> > How do you search for blocks that are similar, so that the "XOR" method
> is
> > benefical for them? How do you make sure that you don't perform the "XOR"
> > operation twice, if there's system crash when performing it?
>
> CC: Kelvin and Tianjie who are familiar with the  OTA generation.
>
> > Could it be possible for you to give us two Android images and a program
> > that calculates difference between them? So that we could see how well we
> > are performing compared to the existing solution.
> >
> > Mikulas
> >
>
Mikulas Patocka Nov. 29, 2021, 2:04 p.m. UTC | #11
On Tue, 23 Nov 2021, Akilesh Kailash wrote:

> With OTA, it is challenging to have one general COW format - for instance
> what is good for the Android ecosystem may not be useful for the
> enterprise world.
> For ex: Most of the space savings in Android comes from COPY operation i.e
> for an incremental OTA, we would have metadata which states:
> 
> COPY BLOCK X -> BLOCK Y
> 
> There is no compression involved with these operations. Compression is only
> when a block is "replaced". All these are too specific to the Android ecosystem.

Why does Android OTA need the COPY operation? If a file is not changed by 
the update, the file could be placed at the same location and no update of 
the file is needed. If a file is changed, it is improbable that the new 
file will contain permutation of blocks of the old file, so I don't see 
how COPY will help here.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
zhangkelvin Nov. 29, 2021, 4:51 p.m. UTC | #12
1. OTA does not control placement of files. That’s done by file
systems(ext4)
2. OTA work at block device level. Not file level
3. Often times, a file would have few of its blocks changed/shuffled, and
rest of the blocks retain old content.

On Mon, Nov 29, 2021 at 06:05 Mikulas Patocka <mpatocka@redhat.com> wrote:

>
>
> On Tue, 23 Nov 2021, Akilesh Kailash wrote:
>
> > With OTA, it is challenging to have one general COW format - for instance
> > what is good for the Android ecosystem may not be useful for the
> > enterprise world.
> > For ex: Most of the space savings in Android comes from COPY operation
> i.e
> > for an incremental OTA, we would have metadata which states:
> >
> > COPY BLOCK X -> BLOCK Y
> >
> > There is no compression involved with these operations. Compression is
> only
> > when a block is "replaced". All these are too specific to the Android
> ecosystem.
>
> Why does Android OTA need the COPY operation? If a file is not changed by
> the update, the file could be placed at the same location and no update of
> the file is needed. If a file is changed, it is improbable that the new
> file will contain permutation of blocks of the old file, so I don't see
> how COPY will help here.
>
> Mikulas
>
> --
Sincerely,

Kelvin Zhang
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

Index: linux-2.6/drivers/md/Kconfig
===================================================================
--- linux-2.6.orig/drivers/md/Kconfig
+++ linux-2.6/drivers/md/Kconfig
@@ -652,4 +652,14 @@  config DM_AUDIT
 	  Enables audit logging of several security relevant events in the
 	  particular device-mapper targets, especially the integrity target.
 
+config DM_UPDATE
+	tristate "Update target support"
+	depends on BLK_DEV_DM
+	select DM_BUFIO
+	select CRYPTO
+	help
+	  The dm-update target allows transparent updates for embedded devices
+
+	  If unsure, say N.
+
 endif # MD
Index: linux-2.6/drivers/md/Makefile
===================================================================
--- linux-2.6.orig/drivers/md/Makefile
+++ linux-2.6/drivers/md/Makefile
@@ -83,6 +83,7 @@  obj-$(CONFIG_DM_LOG_WRITES)	+= dm-log-wr
 obj-$(CONFIG_DM_INTEGRITY)	+= dm-integrity.o
 obj-$(CONFIG_DM_ZONED)		+= dm-zoned.o
 obj-$(CONFIG_DM_WRITECACHE)	+= dm-writecache.o
+obj-$(CONFIG_DM_UPDATE)		+= dm-update.o
 
 ifeq ($(CONFIG_DM_INIT),y)
 dm-mod-objs			+= dm-init.o
Index: linux-2.6/drivers/md/dm-update.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/md/dm-update.c
@@ -0,0 +1,751 @@ 
+#include <linux/device-mapper.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/dm-io.h>
+#include <linux/crypto.h>
+
+#include "dm-update.h"
+
+#define DM_MSG_PREFIX "update"
+
+#define N_BUFFERS		16
+
+#define B_EMPTY			0
+#define B_LOADING		1
+#define B_VALID			2
+
+struct dm_update_buffer {
+	int status;
+	char *compressed_chunk;
+	char *decompressed_chunk;
+	uint64_t src;
+	struct update_entry *e;
+	struct bio_list waiting_bios;
+	struct work_struct work;
+	struct dm_update *u;
+};
+
+struct dm_update {
+	struct dm_dev *system_dev;
+	struct dm_dev *update_dev;
+	struct dm_target *ti;
+	struct dm_io_client *dm_io;
+	unsigned update_lbs;
+	unsigned char update_lbs_bits;
+	struct update_superblock *sb;
+	struct crypto_comp *cc;
+	struct update_entry *entries;
+
+	struct mutex mutex;
+	struct workqueue_struct *decompress_wq;
+	struct bio_list waiting_bios;
+
+	bool no_bg_update;
+	struct workqueue_struct *bg_wq;
+	struct work_struct bg_work;
+	char *bg_compressed_chunk;
+	char *bg_decompressed_chunk;
+	size_t bg_index;
+
+	unsigned buffer_replacement;
+	struct dm_update_buffer buffer[N_BUFFERS];
+};
+
+static int update_rw(struct dm_update *u, bool system_dev, int req_op, sector_t sector, sector_t n_sectors, void *ptr)
+{
+	struct dm_io_region region;
+	struct dm_io_request req;
+
+	region.bdev = system_dev ? u->system_dev->bdev : u->update_dev->bdev;
+	region.sector = sector;
+	region.count = n_sectors;
+
+	req.bi_op = req_op;
+	req.bi_op_flags = REQ_SYNC;
+	req.mem.type = DM_IO_VMA;
+	req.mem.ptr.vma = ptr;
+	req.client = u->dm_io;
+	req.notify.fn = NULL;
+	req.notify.context = NULL;
+
+	return dm_io(&req, 1, &region, NULL);
+}
+
+static int update_decompress(struct dm_update *u, void *src, size_t src_size, void *dst, size_t dst_size)
+{
+	int r;
+	if (!u->cc) {
+		if (unlikely(src_size > dst_size))
+			return -EOVERFLOW;
+		memcpy(dst, src, src_size);
+	} else {
+		unsigned dst_int;
+		if (unlikely(src_size != (unsigned)src_size) ||
+		    unlikely(dst_size != (unsigned)dst_size))
+			return -EOVERFLOW;
+		dst_int = dst_size;
+		r = crypto_comp_decompress(u->cc, src, src_size, dst, &dst_int);
+		if (unlikely(r))
+			return r;
+	}
+	return 0;
+}
+
+static void update_fill_from_buffer(struct dm_update *u, struct dm_update_buffer *b, struct bio *bio)
+{
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	struct update_entry *e = bio->bi_private;
+	size_t data_offset = (size_t)le32_to_cpu(e->offset) << u->sb->block_bits;
+	size_t bio_offset = (bio->bi_iter.bi_sector & ((1 << (u->sb->block_bits - SECTOR_SHIFT)) - 1)) << SECTOR_SHIFT;
+	const char *data = b->decompressed_chunk + data_offset + bio_offset;
+
+	bio_for_each_segment(bv, bio, iter) {
+		char *addr = kmap_local_page(bv.bv_page);
+		memcpy(addr + bv.bv_offset, data, bv.bv_len);
+		flush_dcache_page(bv.bv_page);
+		kunmap_local(addr);
+		data += bv.bv_len;
+	}
+
+	bio_endio(bio);
+}
+
+static void update_process_bio(struct dm_update *u, struct bio *bio)
+{
+	struct update_entry *e;
+	struct dm_update_buffer *b;
+	uint64_t src;
+	int i;
+
+	e = bio->bi_private;
+
+	src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
+	for (i = 0; i < N_BUFFERS; i++) {
+		b = &u->buffer[i];
+		if (b->status == B_EMPTY)
+			continue;
+		if (b->src == src) {
+			if (b->status == B_LOADING) {
+				bio_list_add(&b->waiting_bios, bio);
+			} else {
+				update_fill_from_buffer(u, b, bio);
+			}
+			return;
+		}
+	}
+	for (i = 0; i < N_BUFFERS; i++) {
+		b = &u->buffer[i];
+		if (b->status == B_EMPTY) {
+replace_buffer:
+			bio_list_add(&b->waiting_bios, bio);
+			b->status = B_LOADING;
+			b->src = src;
+			b->e = e;
+			queue_work(u->decompress_wq, &b->work);
+			return;
+		}
+	}
+	for (i = 0; i < N_BUFFERS; i++) {
+		b = &u->buffer[u->buffer_replacement];
+		u->buffer_replacement = (u->buffer_replacement + 1) % N_BUFFERS;
+		if (b->status == B_VALID)
+			goto replace_buffer;
+	}
+	bio_list_add(&u->waiting_bios, bio);
+}
+
+static void dm_update_get_locations(struct dm_update *u, struct update_entry *e, uint64_t *src, sector_t *sector, sector_t *n_sectors, size_t *front_pad, size_t *compressed_length)
+{
+	uint64_t next_src;
+	*src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
+	do {
+		e++;
+		next_src = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
+	} while (next_src == *src);
+
+	*compressed_length = next_src - *src;
+	*front_pad = *src & (u->update_lbs - 1);
+	*sector = *src >> u->update_lbs_bits << (u->update_lbs_bits - SECTOR_SHIFT);
+	*n_sectors = round_up(*front_pad + *compressed_length, u->update_lbs) >> SECTOR_SHIFT;
+}
+
+static void dm_update_buffer_work(struct work_struct *w)
+{
+	struct dm_update_buffer *b = container_of(w, struct dm_update_buffer, work);
+	struct dm_update *u = b->u;
+	uint64_t src;
+	size_t front_pad, compressed_length;
+	sector_t sector, n_sectors;
+	struct bio *bio, *waiting_bios;
+	int r;
+
+	dm_update_get_locations(u, b->e, &src, &sector, &n_sectors, &front_pad, &compressed_length);
+
+	r = update_rw(u, false, REQ_OP_READ, sector, n_sectors, b->compressed_chunk);
+	if (unlikely(r))
+		goto io_error;
+
+	r = update_decompress(u, b->compressed_chunk + front_pad, compressed_length, b->decompressed_chunk, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
+	if (unlikely(r))
+		goto io_error;
+
+io_error:
+	mutex_lock(&u->mutex);
+	b->status = likely(!r) ? B_VALID : B_EMPTY;
+	while ((bio = bio_list_pop(&b->waiting_bios))) {
+		if (unlikely(r)) {
+			bio->bi_status = errno_to_blk_status(r);
+			bio_endio(bio);
+		} else {
+			update_fill_from_buffer(u, b, bio);
+		}
+	}
+
+	waiting_bios = bio_list_get(&u->waiting_bios);
+	while (waiting_bios != NULL) {
+		bio = waiting_bios;
+		waiting_bios = bio->bi_next;
+		bio->bi_next = NULL;
+		update_process_bio(u, bio);
+	}
+
+	mutex_unlock(&u->mutex);
+}
+
+static int update_map(struct dm_target *ti, struct bio *bio)
+{
+	struct dm_update *u = ti->private;
+	sector_t block;
+	size_t first, last, half;
+	struct update_entry *e;
+
+	if (bio_data_dir(bio) == WRITE)
+		return DM_MAPIO_KILL;
+
+	block = bio->bi_iter.bi_sector >> (u->sb->block_bits - SECTOR_SHIFT);
+
+	first = 0;
+	last = le64_to_cpu(u->sb->dir_n) - 1;
+	while (first < last) {
+		sector_t test_block;
+		half = first / 2 + last / 2 + (first & last & 1);
+		e = &u->entries[half];
+		test_block = le32_to_cpu(e->dest_lo) + ((uint64_t)le16_to_cpu(e->dest_hi) << 32);
+		if (test_block == block)
+			goto found;
+		if (test_block < block) {
+			first = half + 1;
+		} else {
+			last = half;
+		}
+	}
+
+	bio_set_dev(bio, u->system_dev->bdev);
+	return DM_MAPIO_REMAPPED;
+
+found:
+	bio->bi_private = e;
+
+	mutex_lock(&u->mutex);
+	update_process_bio(u, bio);
+	mutex_unlock(&u->mutex);
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+static void update_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen)
+{
+	struct dm_update *u = ti->private;
+	unsigned extra_args;
+	unsigned sz = 0;
+
+	switch (type) {
+		case STATUSTYPE_INFO:
+			DMEMIT("%zu %llu", READ_ONCE(u->bg_index), (unsigned long long)(le64_to_cpu(u->sb->dir_n) - 1));
+			break;
+		case STATUSTYPE_TABLE:
+			DMEMIT("%s %s ", u->system_dev->name, u->update_dev->name);
+
+			extra_args = 0;
+			if (u->no_bg_update)
+				extra_args++;
+
+			DMEMIT("%u", extra_args);
+			if (u->no_bg_update)
+				DMEMIT(" no_bg_update");
+			break;
+		case STATUSTYPE_IMA:
+			DMEMIT_TARGET_NAME_VERSION(ti->type);
+			DMEMIT(",update_system_device=%s", u->system_dev->name);
+			DMEMIT(",update_update_device=%s", u->update_dev->name);
+			DMEMIT(";");
+			break;
+	}
+}
+
+static int update_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data)
+{
+	struct dm_update *u = ti->private;
+
+	return fn(ti, u->system_dev, 0, ti->len, data);
+}
+
+static void update_background_work(struct work_struct *w)
+{
+	struct dm_update *u = container_of(w, struct dm_update, bg_work);
+	uint64_t src;
+	size_t front_pad, compressed_length;
+	sector_t sector, n_sectors;
+	int r;
+
+	if (u->bg_index >= le64_to_cpu(u->sb->dir_n) - 1)
+		return;
+
+	dm_update_get_locations(u, &u->entries[u->bg_index], &src, &sector, &n_sectors, &front_pad, &compressed_length);
+
+	r = update_rw(u, false, REQ_OP_READ, sector, n_sectors, u->bg_compressed_chunk);
+	if (unlikely(r)) {
+		DMERR("error reading update device (%d), aborting backgroup update", r);
+		return;
+	}
+
+	r = update_decompress(u, u->bg_compressed_chunk + front_pad, compressed_length, u->bg_decompressed_chunk, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
+	if (unlikely(r)) {
+		DMERR("error decompressing update data (%d), aborting backgroup update", r);
+		return;
+	}
+
+	while (u->bg_index < le64_to_cpu(u->sb->dir_n) - 1) {
+		uint64_t s, dest;
+		size_t offset;
+		struct update_entry *e = &u->entries[u->bg_index];
+		s = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
+		if (s != src)
+			break;
+
+		dest = le32_to_cpu(e->dest_lo) + ((uint64_t)le16_to_cpu(e->dest_hi) << 32);
+		offset = (size_t)le32_to_cpu(e->offset) << u->sb->block_bits;
+
+		r = update_rw(u, true, REQ_OP_WRITE, dest << (u->sb->block_bits - SECTOR_SHIFT), 1UL << (u->sb->block_bits - SECTOR_SHIFT), u->bg_decompressed_chunk + offset);
+		if (unlikely(r)) {
+			DMERR("error writing to the system device (%d), aborting backgroup update", r);
+			return;
+		}
+
+		if (unlikely(u->bg_index == le64_to_cpu(u->sb->dir_n) - 2)) {
+			r = update_rw(u, true, REQ_OP_WRITE | REQ_PREFLUSH, 0, 0, NULL);
+			if (unlikely(r)) {
+				DMERR("error synchronizing the system device (%d), aborting backgroup update", r);
+				return;
+			}
+		}
+
+		WRITE_ONCE(u->bg_index, u->bg_index + 1);
+	}
+
+	queue_work(u->bg_wq, &u->bg_work);
+}
+
+static void update_presuspend(struct dm_target *ti)
+{
+	struct dm_update *u = ti->private;
+	cancel_work_sync(&u->bg_work);
+}
+
+static void update_resume(struct dm_target *ti)
+{
+	struct dm_update *u = ti->private;
+	if (u->no_bg_update)
+		return;
+	queue_work(u->bg_wq, &u->bg_work);
+}
+
+static void update_dtr(struct dm_target *ti)
+{
+	struct dm_update *u = ti->private;
+	int i;
+
+	if (u->bg_wq)
+		destroy_workqueue(u->bg_wq);
+	if (u->decompress_wq)
+		destroy_workqueue(u->decompress_wq);
+
+	vfree(u->bg_compressed_chunk);
+	vfree(u->bg_decompressed_chunk);
+
+	for (i = 0; i < N_BUFFERS; i++) {
+		vfree(u->buffer[i].compressed_chunk);
+		vfree(u->buffer[i].decompressed_chunk);
+	}
+	vfree(u->sb);
+	vfree(u->entries);
+	if (u->dm_io)
+		dm_io_client_destroy(u->dm_io);
+	if (u->system_dev)
+		dm_put_device(ti, u->system_dev);
+	if (u->update_dev)
+		dm_put_device(ti, u->update_dev);
+	if (u->cc)
+		crypto_free_comp(u->cc);
+
+	mutex_init(&u->mutex);
+
+	kfree(u);
+}
+
+static char *validate_sb(struct dm_update *u)
+{
+	struct update_superblock *sb = u->sb;
+
+	if (sb->block_bits < SECTOR_SHIFT || sb->block_bits >= 32)
+		return "Invalid superblock: block_bits";
+
+	if (!le32_to_cpu(sb->chunk_blocks))
+		return "Invalid superblock: chunk_blocks is zero";
+
+	if (le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits >> u->sb->block_bits != le32_to_cpu(u->sb->chunk_blocks))
+		return "Invalid superblock: too large chunk_blocks";
+
+	if ((int)(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits) < 0)
+		return "Invalid superblock: too large chunk_blocks";
+
+	if (le64_to_cpu(sb->dir_n) < 1)
+		return "Invalid superblock: zero dir_n";
+
+	if ((size_t)le64_to_cpu(sb->dir_n) * sizeof(struct update_entry) / sizeof(struct update_entry) != le64_to_cpu(sb->dir_n))
+		return "Invalid superblock: overflow in dir_n";
+
+	return NULL;
+}
+
+static char *validate_metadata(struct dm_update *u)
+{
+	struct update_superblock *sb = u->sb;
+	size_t i;
+	size_t n = le64_to_cpu(sb->dir_n) - 1;
+
+	for (i = 0; i < n; i++) {
+		struct update_entry *e1 = &u->entries[i];
+		struct update_entry *e2 = &u->entries[i + 1];
+		uint64_t dest1, dest2;
+
+		if (le32_to_cpu(e1->offset) >= le32_to_cpu(sb->chunk_blocks))
+			return "Invalid metadata: offset is too high";
+
+		dest1 = le32_to_cpu(e1->dest_lo) + ((uint64_t)le16_to_cpu(e1->dest_hi) << 32);
+		dest2 = le32_to_cpu(e2->dest_lo) + ((uint64_t)le16_to_cpu(e2->dest_hi) << 32);
+
+		if (dest1 >= dest2)
+			return "Invalid metadata: destination is not monotonic";
+	}
+
+	return NULL;
+}
+
+static int update_ctr(struct dm_target *ti, unsigned argc, char **argv)
+{
+	struct dm_update *u;
+	struct dm_arg_set as;
+	const char *string;
+	unsigned opt_params;
+	char *err;
+	int r;
+	uint64_t compressed_dir_size;
+	void *compressed_dir = NULL;
+	size_t dst_len;
+	int i;
+	size_t o;
+	sector_t max_compressed_sectors;
+
+	static const struct dm_arg _args[] = {
+		{0, 1, "Invalid number of feature args"},
+	};
+
+	as.argc = argc;
+	as.argv = argv;
+
+	u = kzalloc(sizeof(struct dm_update), GFP_KERNEL);
+	if (!u) {
+		ti->error = "Cannot allocate dm_update structure";
+		return -ENOMEM;
+	}
+
+	ti->private = u;
+	u->ti = ti;
+
+	mutex_init(&u->mutex);
+	bio_list_init(&u->waiting_bios);
+	INIT_WORK(&u->bg_work, update_background_work);
+
+	string = dm_shift_arg(&as);
+	if (!string)
+		goto bad_arguments;
+
+	r = dm_get_device(ti, string, FMODE_READ | FMODE_WRITE, &u->system_dev);
+	if (r) {
+		ti->error = "System device lookup failed";
+		goto bad;
+	}
+
+	string = dm_shift_arg(&as);
+	if (!string)
+		goto bad_arguments;
+
+	r = dm_get_device(ti, string, FMODE_READ | FMODE_WRITE, &u->update_dev);
+	if (r) {
+		ti->error = "Update device lookup failed";
+		goto bad;
+	}
+
+	r = dm_read_arg_group(_args, &as, &opt_params, &ti->error);
+	if (r)
+		goto bad;
+
+	while (opt_params) {
+		string = dm_shift_arg(&as), opt_params--;
+		if (!strcasecmp(string, "no_bg_update")) {
+			u->no_bg_update = true;
+		} else {
+			r = -EINVAL;
+			ti->error = "Invalid optional argument";
+			goto bad;
+		}
+	}
+
+	u->update_lbs = bdev_logical_block_size(u->update_dev->bdev);
+	u->update_lbs_bits = __ffs(u->update_lbs);
+
+	u->decompress_wq = alloc_workqueue("dm-update", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0);
+	if (!u->decompress_wq) {
+		ti->error = "Cannot allocate workqueue";
+		r = -ENOMEM;
+		goto bad;
+	}
+
+	if (!u->no_bg_update) {
+		u->bg_wq = alloc_workqueue("dm-update-background", WQ_CPU_INTENSIVE | WQ_UNBOUND, 1);
+		if (!u->bg_wq) {
+			ti->error = "Cannot allocate workqueue";
+			r = -ENOMEM;
+			goto bad;
+		}
+	}
+
+	u->dm_io = dm_io_client_create();
+	if (IS_ERR(u->dm_io)) {
+		r = PTR_ERR(u->dm_io);
+		u->dm_io = NULL;
+		ti->error = "Unable to allocate dm-io client";
+		goto bad;
+	}
+
+	u->sb = vmalloc(u->update_lbs);
+	if (!u->sb) {
+		r = -ENOMEM;
+		ti->error = "Cannot allocate superblock";
+		goto bad;
+	}
+
+	r = update_rw(u, false, REQ_OP_READ, 0, u->update_lbs >> SECTOR_SHIFT, u->sb);
+	if (r) {
+		ti->error = "Cannot read superblock";
+		goto bad;
+	}
+
+	if (memcmp(u->sb->magic, UPDATE_MAGIC, 8)) {
+		r = -EINVAL;
+		ti->error = "Invalid magic in the superblock";
+		//printk("%02x %02x %02x %02x %02x %02x %02x %02x\n", u->sb->magic[0], u->sb->magic[1], u->sb->magic[2], u->sb->magic[3], u->sb->magic[4], u->sb->magic[5], u->sb->magic[6], u->sb->magic[7]);
+		goto bad;
+	}
+
+	if (u->sb->version != UPDATE_VERSION) {
+		r = -EINVAL;
+		ti->error = "Invalid version in the superblock";
+		goto bad;
+	}
+
+	if ((err = validate_sb(u))) {
+		r = -EINVAL;
+		ti->error = err;
+		goto bad;
+	}
+
+	r = dm_set_target_max_io_len(ti, (sector_t)1 << (u->sb->block_bits - SECTOR_SHIFT));
+	if (r) {
+		ti->error = "Invalid block size in the superblock";
+		goto bad;
+	}
+
+	if (!memchr(u->sb->compression, 0, sizeof u->sb->compression)) {
+		r = -EINVAL;
+		ti->error = "Invalid compression algorithm in the superblock";
+		goto bad;
+	}
+	if (strcmp(u->sb->compression, "none")) {
+		u->cc = crypto_alloc_comp(u->sb->compression, 0, 0);
+		if (!u->cc)
+			u->cc = ERR_PTR(-ENOMEM);
+		if (IS_ERR(u->cc)) {
+			r = PTR_ERR(u->cc);
+			u->cc = NULL;
+			ti->error = "Unsupported compression method";
+			goto bad;
+		}
+	}
+
+	compressed_dir_size = roundup((le64_to_cpu(u->sb->dir_offset) & (u->update_lbs - 1)) + le64_to_cpu(u->sb->dir_compressed_size), u->update_lbs);
+	if (compressed_dir_size != (size_t)compressed_dir_size) {
+		r = -EOVERFLOW;
+		ti->error = "Compressed directory is too large for 32-bit system";
+		goto bad;
+	}
+
+	compressed_dir = vmalloc(compressed_dir_size);
+	if (!compressed_dir) {
+		r = -ENOMEM;
+		ti->error = "Cannot allocate compressed directory";
+		goto bad;
+	}
+
+	r = update_rw(u, false, REQ_OP_READ, round_down(le64_to_cpu(u->sb->dir_offset), u->update_lbs) >> SECTOR_SHIFT, compressed_dir_size >> SECTOR_SHIFT, compressed_dir);
+	if (r) {
+		ti->error = "Cannot read compressed directory";
+		goto bad;
+	}
+
+	dst_len = le64_to_cpu(u->sb->dir_n) * sizeof(struct update_entry);
+	u->entries = vmalloc(dst_len);
+	if (!u->entries) {
+		r = -ENOMEM;
+		ti->error = "Cannot allocate decompressed directory";
+		goto bad;
+	}
+
+	r = update_decompress(u, compressed_dir + (le64_to_cpu(u->sb->dir_offset) & (u->update_lbs - 1)), le64_to_cpu(u->sb->dir_compressed_size), (void *)u->entries, dst_len);
+	if (r) {
+		ti->error = "Cannot decompress directory";
+		goto bad;
+	}
+
+	if (dst_len != le64_to_cpu(u->sb->dir_n) * sizeof(struct update_entry)) {
+		r = -EINVAL;
+		ti->error = "Non-matching length of compressed directory";
+		goto bad;
+	}
+
+	vfree(compressed_dir);
+	compressed_dir = NULL;
+
+	if ((err = validate_metadata(u))) {
+		r = -EINVAL;
+		ti->error = err;
+		goto bad;
+	}
+
+	o = 0;
+	max_compressed_sectors = 1;
+	while (o < le64_to_cpu(u->sb->dir_n) - 1) {
+		struct update_entry *e = &u->entries[o];
+		uint64_t src, s;
+		size_t front_pad, compressed_length;
+		sector_t sector, n_sectors;
+		dm_update_get_locations(u, e, &src, &sector, &n_sectors, &front_pad, &compressed_length);
+		if (n_sectors > max_compressed_sectors)
+			max_compressed_sectors = n_sectors;
+		do {
+			o++;
+			if (o >= le64_to_cpu(u->sb->dir_n) - 1)
+				break;
+			e = &u->entries[o];
+			s = le32_to_cpu(e->src_lo) + ((uint64_t)le16_to_cpu(e->src_hi) << 32);
+		} while (s == src);
+	}
+
+	for (i = 0; i < N_BUFFERS; i++) {
+		struct dm_update_buffer *b = &u->buffer[i];
+		b->decompressed_chunk = vmalloc(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
+		if (!b->decompressed_chunk) {
+			r = -ENOMEM;
+			ti->error = "Cannot allocate buffers";
+			goto bad;
+		}
+		//memset(b->decompressed_chunk, 0xfe, le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
+		b->compressed_chunk = vmalloc(max_compressed_sectors << SECTOR_SHIFT);
+		if (!b->compressed_chunk) {
+			r = -ENOMEM;
+			ti->error = "Cannot allocate buffers";
+			goto bad;
+		}
+		//memset(b->compressed_chunk, 0xfd, max_compressed_sectors << SECTOR_SHIFT);
+		bio_list_init(&b->waiting_bios);
+		INIT_WORK(&b->work, dm_update_buffer_work);
+		b->u = u;
+	}
+
+	if (!u->no_bg_update) {
+		u->bg_decompressed_chunk = vmalloc(le32_to_cpu(u->sb->chunk_blocks) << u->sb->block_bits);
+		if (!u->bg_decompressed_chunk) {
+			r = -ENOMEM;
+			ti->error = "Cannot allocate buffers";
+			goto bad;
+		}
+		u->bg_compressed_chunk = vmalloc(max_compressed_sectors << SECTOR_SHIFT);
+		if (!u->bg_compressed_chunk) {
+			r = -ENOMEM;
+			ti->error = "Cannot allocate buffers";
+			goto bad;
+		}
+	}
+
+	return 0;
+
+bad_arguments:
+	ti->error = "Not enough arguments";
+	r = -EINVAL;
+bad:
+	if (compressed_dir)
+		vfree(compressed_dir);
+	update_dtr(ti);
+	return r;
+}
+
+static struct target_type update_target = {
+	.name			= "update",
+	.version		= {1, 0, 0},
+	.module			= THIS_MODULE,
+	.ctr			= update_ctr,
+	.dtr			= update_dtr,
+	.map			= update_map,
+	.status			= update_status,
+	.iterate_devices	= update_iterate_devices,
+	.presuspend		= update_presuspend,
+	.resume			= update_resume,
+};
+
+static int __init dm_update_init(void)
+{
+	int r;
+
+	r = dm_register_target(&update_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		return r;
+	}
+
+	return 0;
+}
+
+static void __exit dm_update_exit(void)
+{
+	dm_unregister_target(&update_target);
+}
+
+module_init(dm_update_init);
+module_exit(dm_update_exit);
+
+MODULE_DESCRIPTION(DM_NAME " update target");
+MODULE_AUTHOR("Mikulas Patocka <dm-devel@redhat.com>");
+MODULE_LICENSE("GPL");
Index: linux-2.6/drivers/md/dm-update.h
===================================================================
--- /dev/null
+++ linux-2.6/drivers/md/dm-update.h
@@ -0,0 +1,23 @@ 
+#define UPDATE_MAGIC	"update\0"
+#define UPDATE_VERSION	0
+
+struct update_superblock {
+	char magic[8];
+	uint8_t version;
+	uint8_t block_bits;
+	uint16_t pad1;
+	__le32 chunk_blocks;
+	char compression[16];
+	__le64 dir_offset;
+	__le64 dir_compressed_size;
+	__le64 dir_n;
+	__le64 pad2;
+};
+
+struct update_entry {
+	__le32 dest_lo;
+	__le16 dest_hi;
+	__le16 src_hi;
+	__le32 src_lo;
+	__le32 offset;
+};
Index: linux-2.6/Documentation/admin-guide/device-mapper/update.rst
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/admin-guide/device-mapper/update.rst
@@ -0,0 +1,54 @@ 
+=============
+Update target
+=============
+
+Embedded devices typically have a read-only partition that stores the operating
+system image. The purpose of the dm-update target is to transparently update
+this partition while the system is running.
+
+How to use the dm-update target:
+1. We have files "old.img" and "new.img" that contain the old and new system
+   image.
+
+2. We calculate the difference between these images with the update-diff
+   utility - it could be downloaded from
+   http://people.redhat.com/~mpatocka/dm-update/
+
+   ./update-diff old.img new.img -o diff.img -a zstd
+
+   This command calculates the difference between old.img and new.img,
+   compresses it with the "zstd" algorithm and stores it in the file diff.img.
+
+3. The file diff.img is delivered to the embedded system.
+
+4. The embedded system is rebooted.
+
+5. On next boot, we load the dm-update target (assume that /dev/sda1 is the
+   system partition and diff.img is the downloaded difference file):
+   # losetup /dev/loop0 diff.img
+   # dmsetup create system --table "0 `blockdev --getsize /dev/sda1` update
+     /dev/sda1 /dev/loop0 0"
+
+6. The update target will present the system image as if the update already
+   happened - if you read from blocks that are modified, it will transparently
+   load the data from "diff.img" and decompress it.
+
+7. On background, the update target will copy the contents of the file
+   "diff.img" to the partition /dev/sda1
+
+8. When the copying finishes (it can be detected by dmsetup status - if the
+   first two numbers are equal, the copying finished), you can replace the
+   update target with a linear target. After replacement, you can delete the
+   loop device /dev/loop0 and the file diff.img. The partition /dev/sda1 is now
+   equal to the file "new.img".
+
+Constructor parameters:
+1. the system partition
+2. the device that contains the difference between old and new image
+3. the number of optional arguments
+	no_bg_update
+		don't perform background update of the system partition
+
+Status:
+1. the number of blocks that were updated so far
+2. the total number of blocks in the difference file