diff mbox

[2,of,2] DM Snapshot: dont insert before existing chunk

Message ID 200909231525.n8NFP87C029957@hydrogen.msp.redhat.com (mailing list archive)
State Rejected, archived
Delegated to: Alasdair Kergon
Headers show

Commit Message

Jonthan Brassow Sept. 23, 2009, 3:25 p.m. UTC
Patch name: dm-snapshot-dont-insert-before-existing-chunk.patch

Don't insert into hash before an existing chunk.

Most inserts are after an existing extent anyway so this code is used
very rarely.

The snapshot merge code always pulls the chunk from the end of the
range.  Without introducing unnecessary complexity, snapshot merge can't
pull the exception from the middle of a range --- i.e. if you have range
1-10 and you merge chunk 5, you'd have to split the range into two.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Alasdair G Kergon Sept. 24, 2009, 12:48 p.m. UTC | #1
On Wed, Sep 23, 2009 at 10:25:08AM -0500, Jon Brassow wrote:
> Don't insert into hash before an existing chunk.
> Most inserts are after an existing extent anyway so this code is used
> very rarely.
 
Again, I'm not particularly keen on taking this one.  Some systems are very
short of memory and everything we can save matters and we have no statistics
about how "rare" this actually is.

Is it really that difficult to split into two when removing?

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka Sept. 24, 2009, 2:10 p.m. UTC | #2
On Thu, 24 Sep 2009, Alasdair G Kergon wrote:

> On Wed, Sep 23, 2009 at 10:25:08AM -0500, Jon Brassow wrote:
> > Don't insert into hash before an existing chunk.
> > Most inserts are after an existing extent anyway so this code is used
> > very rarely.
>  
> Again, I'm not particularly keen on taking this one.  Some systems are very
> short of memory and everything we can save matters and we have no statistics
> about how "rare" this actually is.
> 
> Is it really that difficult to split into two when removing?

It is code bloat --- it would increase coding/reviewing/testing time (the 
splitting would have to be simulated because it happens rarely).

Also, it could lead to the situation that memory consumption starts 
growing when merging.

Mikulas

> Alasdair
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka Sept. 24, 2009, 2:15 p.m. UTC | #3
On Thu, 24 Sep 2009, Mikulas Patocka wrote:

> On Thu, 24 Sep 2009, Alasdair G Kergon wrote:
> 
> > On Wed, Sep 23, 2009 at 10:25:08AM -0500, Jon Brassow wrote:
> > > Don't insert into hash before an existing chunk.
> > > Most inserts are after an existing extent anyway so this code is used
> > > very rarely.
> >  
> > Again, I'm not particularly keen on taking this one.  Some systems are very
> > short of memory and everything we can save matters and we have no statistics
> > about how "rare" this actually is.
> > 
> > Is it really that difficult to split into two when removing?
> 
> It is code bloat --- it would increase coding/reviewing/testing time (the 
> splitting would have to be simulated because it happens rarely).
> 
> Also, it could lead to the situation that memory consumption starts 
> growing when merging.
> 
> Mikulas

These backmerges happen if someone writes to the logical volume backwards. 
Do you know any common usage scenario that does it?

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Sept. 24, 2009, 2:21 p.m. UTC | #4
On Thu, Sep 24 2009 at 10:15am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> On Thu, 24 Sep 2009, Mikulas Patocka wrote:
> 
> > On Thu, 24 Sep 2009, Alasdair G Kergon wrote:
> > 
> > > On Wed, Sep 23, 2009 at 10:25:08AM -0500, Jon Brassow wrote:
> > > > Don't insert into hash before an existing chunk.
> > > > Most inserts are after an existing extent anyway so this code is used
> > > > very rarely.
> > >  
> > > Again, I'm not particularly keen on taking this one.  Some systems are very
> > > short of memory and everything we can save matters and we have no statistics
> > > about how "rare" this actually is.
> > > 
> > > Is it really that difficult to split into two when removing?
> > 
> > It is code bloat --- it would increase coding/reviewing/testing time (the 
> > splitting would have to be simulated because it happens rarely).
> > 
> > Also, it could lead to the situation that memory consumption starts 
> > growing when merging.
> > 
> > Mikulas
> 
> These backmerges happen if someone writes to the logical volume backwards. 
> Do you know any common usage scenario that does it?

I was able to reliably reproduce backmerges simply by taking a snapshot
of a 32G LV and then formatting that origin with ext3.  This resulted in
a ~9% copy-out to the snapshot.

AFAIK mkfs.ext3 does write backwards for updating block groups.  In
practice it clearly does because without disabling the insertion before
other chunks I would reliably BUG in clear_exception (as I shared
earlier today on dm-devel).

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka Sept. 25, 2009, 3:40 a.m. UTC | #5
On Thu, 24 Sep 2009, Mikulas Patocka wrote:

> On Thu, 24 Sep 2009, Mikulas Patocka wrote:
> 
> > On Thu, 24 Sep 2009, Alasdair G Kergon wrote:
> > 
> > > On Wed, Sep 23, 2009 at 10:25:08AM -0500, Jon Brassow wrote:
> > > > Don't insert into hash before an existing chunk.
> > > > Most inserts are after an existing extent anyway so this code is used
> > > > very rarely.
> > >  
> > > Again, I'm not particularly keen on taking this one.  Some systems are very
> > > short of memory and everything we can save matters and we have no statistics
> > > about how "rare" this actually is.
> > > 
> > > Is it really that difficult to split into two when removing?
> > 
> > It is code bloat --- it would increase coding/reviewing/testing time (the 
> > splitting would have to be simulated because it happens rarely).
> > 
> > Also, it could lead to the situation that memory consumption starts 
> > growing when merging.
> > 
> > Mikulas
> 
> These backmerges happen if someone writes to the logical volume backwards. 
> Do you know any common usage scenario that does it?
> 
> Mikulas

I was wrong about this.

If someone writes backwards, (from chunk 100 to 97), it will be allocated 
100->2, 99->3, 98->4, 97->5 and no back-merge will happen.

The backmerge will happen if the disk driver finishes the writes 
backwards: if someone writes to chunks 100-103, they get reallocated to 
2-5 and copying starts. If the disk driver finishes the copies backwards 
(it first finishes 103->5, then 102->4, then 101->3 and finally 100->2), 
the exceptions will be added in this order to the table and they'll be 
back-merged.

I think that disk shuffling writes is unlikely.

There was already a fix to make kcopyd not reorder writes 
(b673c3a8192e28f13e2050a4b82c1986be92cc15, 2.6.28).

On the other hand, merging could be implemented even without splitting 
(because the merge comes either from the beginning or from the end of the 
range), so we can go with backmerges.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Sept. 25, 2009, 3:43 a.m. UTC | #6
On Thu, Sep 24 2009 at  8:48am -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Wed, Sep 23, 2009 at 10:25:08AM -0500, Jon Brassow wrote:
> > Don't insert into hash before an existing chunk.
> > Most inserts are after an existing extent anyway so this code is used
> > very rarely.
>  
> Again, I'm not particularly keen on taking this one.  Some systems are very
> short of memory and everything we can save matters and we have no statistics
> about how "rare" this actually is.
> 
> Is it really that difficult to split into two when removing?

Jon, Mikulas and I examined/discussed this in detail today.  We had a
nice break-through: snapshot-merge no longer needs to avoid inserting
before an existing chunk.  As such snapshot-merge no longer imposes less
efficient packing of the exception cache.

We needed a small change to the snapshot-merge clear_exception() code so
that it increments e->old_chunk and e->new_chunk when it makes its call
to dm_consecutive_chunk_count_dec(), e.g.:

        if (dm_consecutive_chunk_count(e)) {
                if ((de->old_chunk == e->old_chunk) &&
                    (de->new_chunk == dm_chunk_number(e->new_chunk))) {
                        e->old_chunk++;
                        e->new_chunk++;
                }
                dm_consecutive_chunk_count_dec(e);
        }

This is the inverse of what is done in dm_insert_exception():

        /* Insert before an existing chunk? */
        if (new_e->old_chunk == (e->old_chunk - 1) &&
            new_e->new_chunk == (dm_chunk_number(e->new_chunk) - 1)) {
                dm_consecutive_chunk_count_inc(e);
                e->old_chunk--;
                e->new_chunk--;
                dm_free_exception(eh, new_e);
                return;
        }

Updated DM snapshot-merge patches are available here:
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified/2.6.31/

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

Index: linux-2.6/drivers/md/dm-snap.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-snap.c
+++ linux-2.6/drivers/md/dm-snap.c
@@ -478,16 +478,6 @@  static void insert_completed_exception(s
 			return;
 		}
 
-		/* Insert before an existing chunk? */
-		if (new_e->old_chunk == (e->old_chunk - 1) &&
-		    new_e->new_chunk == (dm_chunk_number(e->new_chunk) - 1)) {
-			dm_consecutive_chunk_count_inc(e);
-			e->old_chunk--;
-			e->new_chunk--;
-			free_exception(new_e);
-			return;
-		}
-
 		if (new_e->old_chunk > e->old_chunk)
 			break;
 	}