diff mbox

[4/8] rbd: move bumping img_request refcount into rbd_obj_request_submit()

Message ID 1474304608-17958-5-git-send-email-idryomov@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ilya Dryomov Sept. 19, 2016, 5:03 p.m. UTC
Commit 0f2d5be792b0 ("rbd: use reference counts for image requests")
added rbd_img_request_get(), which rbd_img_request_fill() calls for
each obj_request added to img_request.  It was an urgent band-aid for
the uglyness that is rbd_img_obj_callback() and none of the error paths
were updated.

Given that this img_request reference is meant to represent an
obj_request that hasn't passed through rbd_img_obj_callback() yet,
proper cleanup in appropriate destructors is a challenge.  However,
noting that if we don't get a chance to call rbd_obj_request_complete(),
there is not going to be a call to rbd_img_obj_callback(), we can move
rbd_img_request_get() into rbd_obj_request_submit() and fixup the two
places that call rbd_obj_request_complete() directly and not through
rbd_obj_request_submit() to temporarily bump img_request, so that
rbd_img_obj_callback() can put as usual.

This takes care of img_request leaks on errors on the submit side.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 drivers/block/rbd.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Alex Elder Sept. 25, 2016, 3:56 p.m. UTC | #1
On 09/19/2016 12:03 PM, Ilya Dryomov wrote:
> Commit 0f2d5be792b0 ("rbd: use reference counts for image requests")
> added rbd_img_request_get(), which rbd_img_request_fill() calls for
> each obj_request added to img_request.  It was an urgent band-aid for
> the uglyness that is rbd_img_obj_callback() and none of the error paths
> were updated.

I'm not sure how to improve that function.  Object requests in
an image request need to be completed in sequential order,
because blk_update_request() requires that.  (Or maybe you
were referring to something else you find ugly.)

> Given that this img_request reference is meant to represent an
> obj_request that hasn't passed through rbd_img_obj_callback() yet,

It was, for the purpose of that commit.  But I don't like to think
of reference counting that way.  I usually try to reason about
reference counts as actual counts of pointers referring to objects.
And in that sense, I think it might have been better in the first
place to drop the an object request's reference to the image request
in rbd_img_obj_request_del(), where the img_request pointer gets
nulled out.  And move the call to rbd_img_request_get() for an
object request into rbd_img_obj_request_add(), where the pointer
gets assigned.  I don't know why I didn't do it that way before;
like you said it was sort of an urgently developed change.

I haven't investigated this alternative exhaustively, but if
it works I think it would be a cleaner fix.

> proper cleanup in appropriate destructors is a challenge.  However,
> noting that if we don't get a chance to call rbd_obj_request_complete(),
> there is not going to be a call to rbd_img_obj_callback(), we can move
> rbd_img_request_get() into rbd_obj_request_submit() and fixup the two
> places that call rbd_obj_request_complete() directly and not through
> rbd_obj_request_submit() to temporarily bump img_request, so that
> rbd_img_obj_callback() can put as usual.
> 
> This takes care of img_request leaks on errors on the submit side.

So *this* is the problem you're aiming to solve here...  It would
have been nice for you to call attention to where these leaks were
occurring.  (It seems rbd_img_obj_parent_read_full_callback()
and rbd_img_obj_exists_callback().)

It looks to me like your change is sound, but again, I think it
would be better to make the reference counting match the actual
recording of references to objects as I mentioned before.  I'm
going to ask for your opinion on that before I offer you my
"reviewed by" on this.

					-Alex

And now, some more gratuitous analysis...

We currently (without this patch) get a reference to an image request
for every object request populated in rbd_img_request_fill(), right
after calling rbd_img_obj_request_fill().  We want each of these to
have a matching reference drop, once that object request is completed,
and the object request no longer has any use for what's in the image
request.  Right now that occurs in rbd_img_obj_callback().  But that
function is only called if an image object request gets successfully
submitted.  And in particular, if an error occurs while processing
a layered write--either while doing an existence check for an object
or when doing a read of the parent data--the original image object
requests never get submitted, and therefore their references to the
image don't get dropped properly.  This issue is the subject of your
patch.  All other image object requests appear to get submitted,
and therefore properly release their reference.

There are two other image request references.  A reference taken
before submitting all the object requests in rbd_img_request_submit();
that reference is dropped in that same function after all object
requests are submitted.

The last reference is the reference that is set when the image
request is created.  That reference should be dropped when we are
done with the image request.  If an image request has no callback
function, that reference is dropped in rbd_img_request_complete().
If there is a callback, that callback function must eventually
lead to the image request reference being dropped.  Only two
image request callback functions are ever assigned:
  rbd_img_parent_read_callback()
  rbd_img_parent_read_full_callback()
Both of those functions grab information needed from the image,
then drop the "original" image request reference.

I might have missed something but I think with your fix, or with
the alternative I propose, we'll have all image request references
accounted for.

> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
>  drivers/block/rbd.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index b247200a0f28..d8070bd29fd1 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1610,11 +1610,17 @@ static bool obj_request_type_valid(enum obj_request_type type)
>  	}
>  }
>  
> +static void rbd_img_obj_callback(struct rbd_obj_request *obj_request);
> +
>  static void rbd_obj_request_submit(struct rbd_obj_request *obj_request)
>  {
>  	struct ceph_osd_request *osd_req = obj_request->osd_req;
>  
>  	dout("%s %p osd_req %p\n", __func__, obj_request, osd_req);
> +	if (obj_request_img_data_test(obj_request)) {
> +		WARN_ON(obj_request->callback != rbd_img_obj_callback);
> +		rbd_img_request_get(obj_request->img_request);
> +	}
>  	ceph_osdc_start_request(osd_req->r_osdc, osd_req, false);
>  }
>  
> @@ -2580,8 +2586,6 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
>  
>  		rbd_img_obj_request_fill(obj_request, osd_req, op_type, 0);
>  
> -		rbd_img_request_get(img_request);
> -
>  		img_offset += length;
>  		resid -= length;
>  	}
> @@ -2715,10 +2719,9 @@ rbd_img_obj_parent_read_full_callback(struct rbd_img_request *img_request)
>  	return;
>  
>  out_err:
> -	/* Record the error code and complete the request */
> -
>  	orig_request->result = img_result;
>  	orig_request->xferred = 0;
> +	rbd_img_request_get(orig_request->img_request);
>  	obj_request_done_set(orig_request);
>  	rbd_obj_request_complete(orig_request);
>  }
> @@ -2873,6 +2876,7 @@ static void rbd_img_obj_exists_callback(struct rbd_obj_request *obj_request)
>  fail_orig_request:
>  	orig_request->result = result;
>  	orig_request->xferred = 0;
> +	rbd_img_request_get(orig_request->img_request);
>  	obj_request_done_set(orig_request);
>  	rbd_obj_request_complete(orig_request);
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov Sept. 26, 2016, 2:38 p.m. UTC | #2
On Sun, Sep 25, 2016 at 5:56 PM, Alex Elder <elder@linaro.org> wrote:
> On 09/19/2016 12:03 PM, Ilya Dryomov wrote:
>> Commit 0f2d5be792b0 ("rbd: use reference counts for image requests")
>> added rbd_img_request_get(), which rbd_img_request_fill() calls for
>> each obj_request added to img_request.  It was an urgent band-aid for
>> the uglyness that is rbd_img_obj_callback() and none of the error paths
>> were updated.
>
> I'm not sure how to improve that function.  Object requests in
> an image request need to be completed in sequential order,
> because blk_update_request() requires that.  (Or maybe you
> were referring to something else you find ugly.)

Quoting 0f2d5be792b0 commit message:

"There is a race here, however.  The instant an object request is
marked "done" it can be provided (by a thread handling completion of
one of its predecessor operations) to rbd_img_obj_end_request(),
which (for the last request) can then lead to the image request
getting torn down.  And this can happen *before* that object has
itself entered rbd_img_obj_end_request().  As a result, once it
*does* enter that function, the image request (and even the object
request itself) may have been freed and become invalid."

"... we don't want rbd_img_request_complete() to necessarily
result in the image request being destroyed, because it may
get called before we've finished processing on all of its
object requests."

This "I'm called for obj_request1, but I'll also complete obj_request2
along with the entire img_request if I get a chance, before I'm called
for obj_request2" behaviour is *really* *really* sneaky.

I think a simple atomic, decremented for each obj_request completion
would do.  When it hits 0, we say "OK, this img_request is finished"
and call rbd_img_obj_end_request() for each obj_request in order.  I'm
not sure we are getting anything from calling blk_update_request() in
this fashion, at the expense of hard to comprehend buggy code.

>
>> Given that this img_request reference is meant to represent an
>> obj_request that hasn't passed through rbd_img_obj_callback() yet,
>
> It was, for the purpose of that commit.  But I don't like to think
> of reference counting that way.  I usually try to reason about
> reference counts as actual counts of pointers referring to objects.
> And in that sense, I think it might have been better in the first
> place to drop the an object request's reference to the image request
> in rbd_img_obj_request_del(), where the img_request pointer gets
> nulled out.  And move the call to rbd_img_request_get() for an
> object request into rbd_img_obj_request_add(), where the pointer
> gets assigned.  I don't know why I didn't do it that way before;
> like you said it was sort of an urgently developed change.
>
> I haven't investigated this alternative exhaustively, but if
> it works I think it would be a cleaner fix.

That won't work, exactly because this reference is not a normal
reference, but a band-aid for what I quoted.  rbd_img_obj_request_del()
is called from rbd_img_request_put() when refcount hits 0, which it
never will because of at least one object request holding it.  For this
to work, object requests have to be deleted explicitly.

>
>> proper cleanup in appropriate destructors is a challenge.  However,
>> noting that if we don't get a chance to call rbd_obj_request_complete(),
>> there is not going to be a call to rbd_img_obj_callback(), we can move
>> rbd_img_request_get() into rbd_obj_request_submit() and fixup the two
>> places that call rbd_obj_request_complete() directly and not through
>> rbd_obj_request_submit() to temporarily bump img_request, so that
>> rbd_img_obj_callback() can put as usual.
>>
>> This takes care of img_request leaks on errors on the submit side.
>
> So *this* is the problem you're aiming to solve here...  It would
> have been nice for you to call attention to where these leaks were
> occurring.  (It seems rbd_img_obj_parent_read_full_callback()
> and rbd_img_obj_exists_callback().)

By "on the submit side", I meant everywhere where we error out before
rbd_obj_request_submit() on the original request.

>
> It looks to me like your change is sound, but again, I think it
> would be better to make the reference counting match the actual
> recording of references to objects as I mentioned before.  I'm
> going to ask for your opinion on that before I offer you my
> "reviewed by" on this.

The aim of this patch was to fix up error paths while preserving the
semantics of this kref.  Any other improvements can only come with the
full re-write of the completion code - it's too fragile.

>
>                                         -Alex
>
> And now, some more gratuitous analysis...
>
> We currently (without this patch) get a reference to an image request
> for every object request populated in rbd_img_request_fill(), right
> after calling rbd_img_obj_request_fill().  We want each of these to
> have a matching reference drop, once that object request is completed,
> and the object request no longer has any use for what's in the image
> request.  Right now that occurs in rbd_img_obj_callback().  But that
> function is only called if an image object request gets successfully
> submitted.  And in particular, if an error occurs while processing
> a layered write--either while doing an existence check for an object
> or when doing a read of the parent data--the original image object
> requests never get submitted, and therefore their references to the
> image don't get dropped properly.  This issue is the subject of your
> patch.  All other image object requests appear to get submitted,
> and therefore properly release their reference.

Correct.  There is also the issue of two-obj_request img_requests,
where the first obj_request gets submitted while the second errors out.
This patch captures it, but there may well be other issues there -
I haven't gone down that rabbit hole yet...

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elder Sept. 28, 2016, 12:22 a.m. UTC | #3
On 09/26/2016 09:38 AM, Ilya Dryomov wrote:
> On Sun, Sep 25, 2016 at 5:56 PM, Alex Elder <elder@linaro.org> wrote:
>> On 09/19/2016 12:03 PM, Ilya Dryomov wrote:
>>> Commit 0f2d5be792b0 ("rbd: use reference counts for image requests")
>>> added rbd_img_request_get(), which rbd_img_request_fill() calls for
>>> each obj_request added to img_request.  It was an urgent band-aid for
>>> the uglyness that is rbd_img_obj_callback() and none of the error paths
>>> were updated.
>>
>> I'm not sure how to improve that function.  Object requests in
>> an image request need to be completed in sequential order,
>> because blk_update_request() requires that.  (Or maybe you
>> were referring to something else you find ugly.)
> 
> Quoting 0f2d5be792b0 commit message:
> 
> "There is a race here, however.  The instant an object request is
> marked "done" it can be provided (by a thread handling completion of
> one of its predecessor operations) to rbd_img_obj_end_request(),
> which (for the last request) can then lead to the image request
> getting torn down.  And this can happen *before* that object has
> itself entered rbd_img_obj_end_request().  As a result, once it
> *does* enter that function, the image request (and even the object
> request itself) may have been freed and become invalid."
> 
> "... we don't want rbd_img_request_complete() to necessarily
> result in the image request being destroyed, because it may
> get called before we've finished processing on all of its
> object requests."
> 
> This "I'm called for obj_request1, but I'll also complete obj_request2
> along with the entire img_request if I get a chance, before I'm called
> for obj_request2" behaviour is *really* *really* sneaky.

OK, I understand that.  (And I think you meant "...before I'm
called for obj_request1".)

> I think a simple atomic, decremented for each obj_request completion
> would do.  When it hits 0, we say "OK, this img_request is finished"
> and call rbd_img_obj_end_request() for each obj_request in order.  I'm
> not sure we are getting anything from calling blk_update_request() in
> this fashion, at the expense of hard to comprehend buggy code.

OK, now you're talking about something different, which is to
change how we handle image object request completions.
Practically speaking, I think you're right that there would
be very little lost by handling an image request's object
request completions all at once rather than as quickly as
possible.

I think the get of the img_request in the error paths looks
a little funny.  It'll now be encapsulated in another function,
so maybe you can add a small comment there to say why you need
to do that--to match what the submit path does (or to match
the reference dropped in the completion path).

I have no argument with what you're doing, and I unfortunately
feel less confident (maybe 10% less) in my walk-through of
all the affected code than I normally am on your patches...
But I'll call that good enough.  I only have small blocks of
time right now and I don't want to delay things.

Reviewed-by: Alex Elder <elder@linaro.org>

>>> Given that this img_request reference is meant to represent an
>>> obj_request that hasn't passed through rbd_img_obj_callback() yet,
>>
>> It was, for the purpose of that commit.  But I don't like to think
>> of reference counting that way.  I usually try to reason about
>> reference counts as actual counts of pointers referring to objects.
>> And in that sense, I think it might have been better in the first
>> place to drop the an object request's reference to the image request
>> in rbd_img_obj_request_del(), where the img_request pointer gets
>> nulled out.  And move the call to rbd_img_request_get() for an
>> object request into rbd_img_obj_request_add(), where the pointer
>> gets assigned.  I don't know why I didn't do it that way before;
>> like you said it was sort of an urgently developed change.
>>
>> I haven't investigated this alternative exhaustively, but if
>> it works I think it would be a cleaner fix.
> 
> That won't work, exactly because this reference is not a normal
> reference, but a band-aid for what I quoted.  rbd_img_obj_request_del()
> is called from rbd_img_request_put() when refcount hits 0, which it
> never will because of at least one object request holding it.  For this
> to work, object requests have to be deleted explicitly.
> 
>>
>>> proper cleanup in appropriate destructors is a challenge.  However,
>>> noting that if we don't get a chance to call rbd_obj_request_complete(),
>>> there is not going to be a call to rbd_img_obj_callback(), we can move
>>> rbd_img_request_get() into rbd_obj_request_submit() and fixup the two
>>> places that call rbd_obj_request_complete() directly and not through
>>> rbd_obj_request_submit() to temporarily bump img_request, so that
>>> rbd_img_obj_callback() can put as usual.
>>>
>>> This takes care of img_request leaks on errors on the submit side.
>>
>> So *this* is the problem you're aiming to solve here...  It would
>> have been nice for you to call attention to where these leaks were
>> occurring.  (It seems rbd_img_obj_parent_read_full_callback()
>> and rbd_img_obj_exists_callback().)
> 
> By "on the submit side", I meant everywhere where we error out before
> rbd_obj_request_submit() on the original request.
> 
>>
>> It looks to me like your change is sound, but again, I think it
>> would be better to make the reference counting match the actual
>> recording of references to objects as I mentioned before.  I'm
>> going to ask for your opinion on that before I offer you my
>> "reviewed by" on this.
> 
> The aim of this patch was to fix up error paths while preserving the
> semantics of this kref.  Any other improvements can only come with the
> full re-write of the completion code - it's too fragile.
> 
>>
>>                                         -Alex
>>
>> And now, some more gratuitous analysis...
>>
>> We currently (without this patch) get a reference to an image request
>> for every object request populated in rbd_img_request_fill(), right
>> after calling rbd_img_obj_request_fill().  We want each of these to
>> have a matching reference drop, once that object request is completed,
>> and the object request no longer has any use for what's in the image
>> request.  Right now that occurs in rbd_img_obj_callback().  But that
>> function is only called if an image object request gets successfully
>> submitted.  And in particular, if an error occurs while processing
>> a layered write--either while doing an existence check for an object
>> or when doing a read of the parent data--the original image object
>> requests never get submitted, and therefore their references to the
>> image don't get dropped properly.  This issue is the subject of your
>> patch.  All other image object requests appear to get submitted,
>> and therefore properly release their reference.
> 
> Correct.  There is also the issue of two-obj_request img_requests,
> where the first obj_request gets submitted while the second errors out.
> This patch captures it, but there may well be other issues there -
> I haven't gone down that rabbit hole yet...
> 
> Thanks,
> 
>                 Ilya
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov Sept. 29, 2016, 3:15 p.m. UTC | #4
On Wed, Sep 28, 2016 at 2:22 AM, Alex Elder <elder@ieee.org> wrote:
> On 09/26/2016 09:38 AM, Ilya Dryomov wrote:
>> On Sun, Sep 25, 2016 at 5:56 PM, Alex Elder <elder@linaro.org> wrote:
>>> On 09/19/2016 12:03 PM, Ilya Dryomov wrote:
>>>> Commit 0f2d5be792b0 ("rbd: use reference counts for image requests")
>>>> added rbd_img_request_get(), which rbd_img_request_fill() calls for
>>>> each obj_request added to img_request.  It was an urgent band-aid for
>>>> the uglyness that is rbd_img_obj_callback() and none of the error paths
>>>> were updated.
>>>
>>> I'm not sure how to improve that function.  Object requests in
>>> an image request need to be completed in sequential order,
>>> because blk_update_request() requires that.  (Or maybe you
>>> were referring to something else you find ugly.)
>>
>> Quoting 0f2d5be792b0 commit message:
>>
>> "There is a race here, however.  The instant an object request is
>> marked "done" it can be provided (by a thread handling completion of
>> one of its predecessor operations) to rbd_img_obj_end_request(),
>> which (for the last request) can then lead to the image request
>> getting torn down.  And this can happen *before* that object has
>> itself entered rbd_img_obj_end_request().  As a result, once it
>> *does* enter that function, the image request (and even the object
>> request itself) may have been freed and become invalid."
>>
>> "... we don't want rbd_img_request_complete() to necessarily
>> result in the image request being destroyed, because it may
>> get called before we've finished processing on all of its
>> object requests."
>>
>> This "I'm called for obj_request1, but I'll also complete obj_request2
>> along with the entire img_request if I get a chance, before I'm called
>> for obj_request2" behaviour is *really* *really* sneaky.
>
> OK, I understand that.  (And I think you meant "...before I'm
> called for obj_request1".)
>
>> I think a simple atomic, decremented for each obj_request completion
>> would do.  When it hits 0, we say "OK, this img_request is finished"
>> and call rbd_img_obj_end_request() for each obj_request in order.  I'm
>> not sure we are getting anything from calling blk_update_request() in
>> this fashion, at the expense of hard to comprehend buggy code.
>
> OK, now you're talking about something different, which is to
> change how we handle image object request completions.

Yeah, how we handle image object request completions is the root of the
problem here.  It's the reason we have rbd_img_request_get/put() in the
first place.

> Practically speaking, I think you're right that there would
> be very little lost by handling an image request's object
> request completions all at once rather than as quickly as
> possible.
>
> I think the get of the img_request in the error paths looks
> a little funny.  It'll now be encapsulated in another function,
> so maybe you can add a small comment there to say why you need
> to do that--to match what the submit path does (or to match
> the reference dropped in the completion path).

I've just re-pushed with the helper.

>
> I have no argument with what you're doing, and I unfortunately
> feel less confident (maybe 10% less) in my walk-through of
> all the affected code than I normally am on your patches...
> But I'll call that good enough.  I only have small blocks of
> time right now and I don't want to delay things.
>
> Reviewed-by: Alex Elder <elder@linaro.org>

Thanks for the review!

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b247200a0f28..d8070bd29fd1 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1610,11 +1610,17 @@  static bool obj_request_type_valid(enum obj_request_type type)
 	}
 }
 
+static void rbd_img_obj_callback(struct rbd_obj_request *obj_request);
+
 static void rbd_obj_request_submit(struct rbd_obj_request *obj_request)
 {
 	struct ceph_osd_request *osd_req = obj_request->osd_req;
 
 	dout("%s %p osd_req %p\n", __func__, obj_request, osd_req);
+	if (obj_request_img_data_test(obj_request)) {
+		WARN_ON(obj_request->callback != rbd_img_obj_callback);
+		rbd_img_request_get(obj_request->img_request);
+	}
 	ceph_osdc_start_request(osd_req->r_osdc, osd_req, false);
 }
 
@@ -2580,8 +2586,6 @@  static int rbd_img_request_fill(struct rbd_img_request *img_request,
 
 		rbd_img_obj_request_fill(obj_request, osd_req, op_type, 0);
 
-		rbd_img_request_get(img_request);
-
 		img_offset += length;
 		resid -= length;
 	}
@@ -2715,10 +2719,9 @@  rbd_img_obj_parent_read_full_callback(struct rbd_img_request *img_request)
 	return;
 
 out_err:
-	/* Record the error code and complete the request */
-
 	orig_request->result = img_result;
 	orig_request->xferred = 0;
+	rbd_img_request_get(orig_request->img_request);
 	obj_request_done_set(orig_request);
 	rbd_obj_request_complete(orig_request);
 }
@@ -2873,6 +2876,7 @@  static void rbd_img_obj_exists_callback(struct rbd_obj_request *obj_request)
 fail_orig_request:
 	orig_request->result = result;
 	orig_request->xferred = 0;
+	rbd_img_request_get(orig_request->img_request);
 	obj_request_done_set(orig_request);
 	rbd_obj_request_complete(orig_request);
 }