mbox series

[0/3] krbd discard optimizations

Message ID 20190129154729.1031-1-idryomov@gmail.com (mailing list archive)
Headers show
Series krbd discard optimizations | expand

Message

Ilya Dryomov Jan. 29, 2019, 3:47 p.m. UTC
Hello,

This should help the "-o discard" case and improve performance across
the board, although we don't have a solution for querying the object
store backend and passing the appropriate value for alloc_size yet.

Ilya Dryomov (3):
  rbd: get rid of obj_req->obj_request_count
  rbd: handle DISCARD and WRITE_ZEROES separately
  rbd: round off and ignore discards that are too small

 drivers/block/rbd.c | 111 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 92 insertions(+), 19 deletions(-)

Comments

Jason Dillaman Jan. 30, 2019, 1:52 p.m. UTC | #1
Should we try to ensure that librbd and krbd have (semi) matching
discard behavior? In librbd, a full object discard on an overlapping
child will truncate the child object to size zero instead of issuing a
delete. With this krbd change, if a child object had previously been
written and now was discarded, the parent data will become visible
again since the child object is deleted. This could be prevented by
issuing a "assert_exists" + "truncate(0)" op to prevent the implicit
zeroing of a full object extent if it was never written and would
prevent the parent data from becoming visible again after a discard.
Obviously the data is still consistent either way but it would be good
to match.

Otherwise, lgtm.

Reviewed-by: Jason Dillaman <dillaman@redhat.com>

On Tue, Jan 29, 2019 at 10:47 AM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> Hello,
>
> This should help the "-o discard" case and improve performance across
> the board, although we don't have a solution for querying the object
> store backend and passing the appropriate value for alloc_size yet.
>
> Ilya Dryomov (3):
>   rbd: get rid of obj_req->obj_request_count
>   rbd: handle DISCARD and WRITE_ZEROES separately
>   rbd: round off and ignore discards that are too small
>
>  drivers/block/rbd.c | 111 +++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 92 insertions(+), 19 deletions(-)
>
> --
> 2.14.4
>
Ilya Dryomov Jan. 30, 2019, 2:41 p.m. UTC | #2
On Wed, Jan 30, 2019 at 2:53 PM Jason Dillaman <jdillama@redhat.com> wrote:
>
> Should we try to ensure that librbd and krbd have (semi) matching
> discard behavior? In librbd, a full object discard on an overlapping
> child will truncate the child object to size zero instead of issuing a
> delete. With this krbd change, if a child object had previously been
> written and now was discarded, the parent data will become visible
> again since the child object is deleted. This could be prevented by

Right, this is what librbd was doing until about a year ago when
I fixed the parent data exposure with a compound create+truncate.

> issuing a "assert_exists" + "truncate(0)" op to prevent the implicit
> zeroing of a full object extent if it was never written and would
> prevent the parent data from becoming visible again after a discard.

truncate on a non-existent object is a no-op, so I'm not sure how
assert_exists+truncate(0) is different from plain truncate(0).

> Obviously the data is still consistent either way but it would be good
> to match.

I thought about this, but decided to drop reverse mapping and overlap
calculation code from rbd_obj_setup_discard() simply because it would
have been semi-matching at best.  It also seems clearer to me: in the
zeroout case we worry about copyups and parent blocks, in the discard
case we don't.  Special casing whole-object discards felt like mixing
them up without a good reason.

Thanks,

                Ilya
Jason Dillaman Jan. 30, 2019, 7:15 p.m. UTC | #3
On Wed, Jan 30, 2019 at 9:41 AM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Wed, Jan 30, 2019 at 2:53 PM Jason Dillaman <jdillama@redhat.com> wrote:
> >
> > Should we try to ensure that librbd and krbd have (semi) matching
> > discard behavior? In librbd, a full object discard on an overlapping
> > child will truncate the child object to size zero instead of issuing a
> > delete. With this krbd change, if a child object had previously been
> > written and now was discarded, the parent data will become visible
> > again since the child object is deleted. This could be prevented by
>
> Right, this is what librbd was doing until about a year ago when
> I fixed the parent data exposure with a compound create+truncate.
>
> > issuing a "assert_exists" + "truncate(0)" op to prevent the implicit
> > zeroing of a full object extent if it was never written and would
> > prevent the parent data from becoming visible again after a discard.
>
> truncate on a non-existent object is a no-op, so I'm not sure how
> assert_exists+truncate(0) is different from plain truncate(0).

Thanks, I forgot about that. My intent was to ensure an existing
object would be truncated to zero but not create an object if it
doesn't exist.

> > Obviously the data is still consistent either way but it would be good
> > to match.
>
> I thought about this, but decided to drop reverse mapping and overlap
> calculation code from rbd_obj_setup_discard() simply because it would
> have been semi-matching at best.  It also seems clearer to me: in the
> zeroout case we worry about copyups and parent blocks, in the discard
> case we don't.  Special casing whole-object discards felt like mixing
> them up without a good reason.

I was only minimally concerned about the case where the copy-up had
already occurred so your delete-on-discard would now re-expose the
parent object extent (e.g. parent has all 'A's, child had all 'B's, so
the discard switches the child to all 'A's instead of zeroes).
Effectively, a full object discard will cause your object extent to go
back-in-time.

> Thanks,
>
>                 Ilya


--
Jason
Ilya Dryomov Jan. 30, 2019, 8:42 p.m. UTC | #4
On Wed, Jan 30, 2019 at 8:16 PM Jason Dillaman <jdillama@redhat.com> wrote:
>
> On Wed, Jan 30, 2019 at 9:41 AM Ilya Dryomov <idryomov@gmail.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 2:53 PM Jason Dillaman <jdillama@redhat.com> wrote:
> > >
> > > Should we try to ensure that librbd and krbd have (semi) matching
> > > discard behavior? In librbd, a full object discard on an overlapping
> > > child will truncate the child object to size zero instead of issuing a
> > > delete. With this krbd change, if a child object had previously been
> > > written and now was discarded, the parent data will become visible
> > > again since the child object is deleted. This could be prevented by
> >
> > Right, this is what librbd was doing until about a year ago when
> > I fixed the parent data exposure with a compound create+truncate.
> >
> > > issuing a "assert_exists" + "truncate(0)" op to prevent the implicit
> > > zeroing of a full object extent if it was never written and would
> > > prevent the parent data from becoming visible again after a discard.
> >
> > truncate on a non-existent object is a no-op, so I'm not sure how
> > assert_exists+truncate(0) is different from plain truncate(0).
>
> Thanks, I forgot about that. My intent was to ensure an existing
> object would be truncated to zero but not create an object if it
> doesn't exist.
>
> > > Obviously the data is still consistent either way but it would be good
> > > to match.
> >
> > I thought about this, but decided to drop reverse mapping and overlap
> > calculation code from rbd_obj_setup_discard() simply because it would
> > have been semi-matching at best.  It also seems clearer to me: in the
> > zeroout case we worry about copyups and parent blocks, in the discard
> > case we don't.  Special casing whole-object discards felt like mixing
> > them up without a good reason.
>
> I was only minimally concerned about the case where the copy-up had
> already occurred so your delete-on-discard would now re-expose the
> parent object extent (e.g. parent has all 'A's, child had all 'B's, so
> the discard switches the child to all 'A's instead of zeroes).
> Effectively, a full object discard will cause your object extent to go
> back-in-time.

Well, returning stale (or random) data from the discarded region
technically doesn't violate any specs.  On a second thought, I think
you are right though -- I only dropped create+truncate because I wanted
to avoid having to calculate the reverse mapping for mostly hand wavy
reasons that definitely don't stand against the sneakiness of going
back in time.  I'll put the "delete or create+truncate" logic back
tomorrow.

Thanks,

                Ilya
Ilya Dryomov Jan. 30, 2019, 9:58 p.m. UTC | #5
On Wed, Jan 30, 2019 at 9:42 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Wed, Jan 30, 2019 at 8:16 PM Jason Dillaman <jdillama@redhat.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 9:41 AM Ilya Dryomov <idryomov@gmail.com> wrote:
> > >
> > > On Wed, Jan 30, 2019 at 2:53 PM Jason Dillaman <jdillama@redhat.com> wrote:
> > > >
> > > > Should we try to ensure that librbd and krbd have (semi) matching
> > > > discard behavior? In librbd, a full object discard on an overlapping
> > > > child will truncate the child object to size zero instead of issuing a
> > > > delete. With this krbd change, if a child object had previously been
> > > > written and now was discarded, the parent data will become visible
> > > > again since the child object is deleted. This could be prevented by
> > >
> > > Right, this is what librbd was doing until about a year ago when
> > > I fixed the parent data exposure with a compound create+truncate.
> > >
> > > > issuing a "assert_exists" + "truncate(0)" op to prevent the implicit
> > > > zeroing of a full object extent if it was never written and would
> > > > prevent the parent data from becoming visible again after a discard.
> > >
> > > truncate on a non-existent object is a no-op, so I'm not sure how
> > > assert_exists+truncate(0) is different from plain truncate(0).
> >
> > Thanks, I forgot about that. My intent was to ensure an existing
> > object would be truncated to zero but not create an object if it
> > doesn't exist.
> >
> > > > Obviously the data is still consistent either way but it would be good
> > > > to match.
> > >
> > > I thought about this, but decided to drop reverse mapping and overlap
> > > calculation code from rbd_obj_setup_discard() simply because it would
> > > have been semi-matching at best.  It also seems clearer to me: in the
> > > zeroout case we worry about copyups and parent blocks, in the discard
> > > case we don't.  Special casing whole-object discards felt like mixing
> > > them up without a good reason.
> >
> > I was only minimally concerned about the case where the copy-up had
> > already occurred so your delete-on-discard would now re-expose the
> > parent object extent (e.g. parent has all 'A's, child had all 'B's, so
> > the discard switches the child to all 'A's instead of zeroes).
> > Effectively, a full object discard will cause your object extent to go
> > back-in-time.
>
> Well, returning stale (or random) data from the discarded region
> technically doesn't violate any specs.  On a second thought, I think
> you are right though -- I only dropped create+truncate because I wanted
> to avoid having to calculate the reverse mapping for mostly hand wavy
> reasons that definitely don't stand against the sneakiness of going
> back in time.  I'll put the "delete or create+truncate" logic back
> tomorrow.

... and of course for ensuring that an existing object is truncated but
not creating an object if it doesn't exist a simple truncate is enough.
A compound create+truncate is only needed for zeroout.

Sorry for the confusion, here is the patch on top:

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 1f5517355af3..9efa7cd54e0c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1871,6 +1871,7 @@ static int rbd_obj_setup_discard(struct
rbd_obj_request *obj_req)
        struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev;
        u64 off = obj_req->ex.oe_off;
        u64 next_off = obj_req->ex.oe_off + obj_req->ex.oe_len;
+       int ret;

        /*
         * Align the range to alloc_size boundary and punt on discards
@@ -1888,11 +1889,16 @@ static int rbd_obj_setup_discard(struct
rbd_obj_request *obj_req)
                        return 1;
        }

+       /* reverse map the entire object onto the parent */
+       ret = rbd_obj_calc_img_extents(obj_req, true);
+       if (ret)
+               return ret;
+
        obj_req->osd_req = rbd_osd_req_create(obj_req, 1);
        if (!obj_req->osd_req)
                return -ENOMEM;

-       if (rbd_obj_is_entire(obj_req)) {
+       if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) {
                osd_req_op_init(obj_req->osd_req, 0, CEPH_OSD_OP_DELETE, 0);
        } else {
                dout("%s %p %llu~%llu -> %llu~%llu\n", __func__,

Thanks,

                Ilya