diff mbox

Issues with exclusive-lock code on testing

Message ID CAOi1vP-+LaGx5q4_MkW7w0gShDzA7fEgAFmJOONvDxL3Owjf5g@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ilya Dryomov March 2, 2016, 11:26 a.m. UTC
Hi Doug,

The cause of that memory corruption is a premature (duplicate, too)
call to rbd_obj_request_complete() in the !object-map DELETE case.
You've got:

    <dispatch>
    rbd_osd_req_callback
      rbd_osd_delete_callback
        rbd_osd_discard_callback
        rbd_obj_request_complete
          <complete obj_request->completion>
                                                <waiter is woken up>
                                                ...
                                                rbd_obj_request_put
                                                <obj_request is gone>
    <do more things with obj_request> <- !!!
    rbd_obj_request_complete
      <complete obj_request->completion>

I also spotted two memory leaks on the NOTIFY_COMPLETE path in
__do_event().  The event one is trivial, the page vector one I have
a question about.  The data item is allocated in alloc_msg() and the
actual buffer is then passed into __do_event() and eventually into
rbd_send_async_notify(), but not further up the stack.  Is anything
going to use it?  If not, we should remove it entirely.

Another thing that caught my eye is your diff adds a bunch of
ceph_get_snap_context() calls on header.snapc with no corresponding
puts.  My understanding is the ones around rbd_image_request_fill() are
there to workaround the fact that rbd_queue_workfn() isn't used, but
the one in rbd_obj_delete_sync() is immediately followed by
ceph_osdc_build_request() which bumps snapc and so is almost certainly
a leak.

The attached patch fixes the use-after-free and plugs those leaks.
With it applied your test loop runs fine for me - no crashes or out of
memory problems.

Thanks,

                Ilya

Comments

Douglas Fuller March 2, 2016, 1:59 p.m. UTC | #1
> On Mar 2, 2016, at 6:26 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> 
> Hi Doug,
> 
> The cause of that memory corruption is a premature (duplicate, too)
> call to rbd_obj_request_complete() in the !object-map DELETE case.
> You've got:
> 
>    <dispatch>
>    rbd_osd_req_callback
>      rbd_osd_delete_callback
>        rbd_osd_discard_callback
>        rbd_obj_request_complete
>          <complete obj_request->completion>
>                                                <waiter is woken up>
>                                                ...
>                                                rbd_obj_request_put
>                                                <obj_request is gone>
>    <do more things with obj_request> <- !!!
>    rbd_obj_request_complete
>      <complete obj_request->completion>

Ah, yes, that is left over from before when I used the delete callback explicitly (assigning it to r_callback) instead of routing it through rbd_osd_req_callback like everything else. That likely explains it, thanks.

> I also spotted two memory leaks on the NOTIFY_COMPLETE path in
> __do_event().  The event one is trivial,

We cannot free the event here because CEPH_WATCH_EVENT_NOTIFY_COMPLETE fills values needed by the caller that may be waiting on it. The waiter has to call ceph_osdc_cancel_event when done with it.

> the page vector one I have
> a question about.  The data item is allocated in alloc_msg() and the
> actual buffer is then passed into __do_event() and eventually into
> rbd_send_async_notify(), but not further up the stack.  Is anything
> going to use it?  If not, we should remove it entirely.

The incoming message may contain the data item. In the case of CEPH_WATCH_EVENT_NOTIFY_COMPLETE, rbd does not expect any data from the OSD, only the return code. The protocol allows it to be present, though, so osd_client should handle this case.

> Another thing that caught my eye is your diff adds a bunch of
> ceph_get_snap_context() calls on header.snapc with no corresponding
> puts.  My understanding is the ones around rbd_image_request_fill() are
> there to workaround the fact that rbd_queue_workfn() isn't used,

Yes. I think I left a comment there, but I’ll add one if I did not.

> but
> the one in rbd_obj_delete_sync() is immediately followed by
> ceph_osdc_build_request() which bumps snapc and so is almost certainly
> a leak.

Yes. I thought you promised not to look at the garbage/junk commits, where that line is.

> The attached patch fixes the use-after-free and plugs those leaks.
> With it applied your test loop runs fine for me - no crashes or out of
> memory problems.
> 
> Thanks,
> 
>                Ilya
> <doug-testing.diff>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov March 2, 2016, 2:10 p.m. UTC | #2
On Wed, Mar 2, 2016 at 2:59 PM, Douglas Fuller <dfuller@redhat.com> wrote:
>
>> On Mar 2, 2016, at 6:26 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>
>> Hi Doug,
>>
>> The cause of that memory corruption is a premature (duplicate, too)
>> call to rbd_obj_request_complete() in the !object-map DELETE case.
>> You've got:
>>
>>    <dispatch>
>>    rbd_osd_req_callback
>>      rbd_osd_delete_callback
>>        rbd_osd_discard_callback
>>        rbd_obj_request_complete
>>          <complete obj_request->completion>
>>                                                <waiter is woken up>
>>                                                ...
>>                                                rbd_obj_request_put
>>                                                <obj_request is gone>
>>    <do more things with obj_request> <- !!!
>>    rbd_obj_request_complete
>>      <complete obj_request->completion>
>
> Ah, yes, that is left over from before when I used the delete callback explicitly (assigning it to r_callback) instead of routing it through rbd_osd_req_callback like everything else. That likely explains it, thanks.
>
>> I also spotted two memory leaks on the NOTIFY_COMPLETE path in
>> __do_event().  The event one is trivial,
>
> We cannot free the event here because CEPH_WATCH_EVENT_NOTIFY_COMPLETE fills values needed by the caller that may be waiting on it. The waiter has to call ceph_osdc_cancel_event when done with it.

My ceph_osdc_put_event() doesn't free the event, it just puts a ref
that was acquired above with get_event().

>
>> the page vector one I have
>> a question about.  The data item is allocated in alloc_msg() and the
>> actual buffer is then passed into __do_event() and eventually into
>> rbd_send_async_notify(), but not further up the stack.  Is anything
>> going to use it?  If not, we should remove it entirely.
>
> The incoming message may contain the data item. In the case of CEPH_WATCH_EVENT_NOTIFY_COMPLETE, rbd does not expect any data from the OSD, only the return code. The protocol allows it to be present, though, so osd_client should handle this case.

If there won't be any users of it on the kernel client side, there's no
point in allocating the data item, extracting that data and propagating
it half way up, only to discard the data and free the data item.

>
>> Another thing that caught my eye is your diff adds a bunch of
>> ceph_get_snap_context() calls on header.snapc with no corresponding
>> puts.  My understanding is the ones around rbd_image_request_fill() are
>> there to workaround the fact that rbd_queue_workfn() isn't used,
>
> Yes. I think I left a comment there, but I’ll add one if I did not.
>
>> but
>> the one in rbd_obj_delete_sync() is immediately followed by
>> ceph_osdc_build_request() which bumps snapc and so is almost certainly
>> a leak.
>
> Yes. I thought you promised not to look at the garbage/junk commits, where that line is.

I looked at the entire diff - it was blowing past 1G of memory, after
all ;)

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Douglas Fuller March 2, 2016, 2:56 p.m. UTC | #3
> On Mar 2, 2016, at 9:10 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> 
> On Wed, Mar 2, 2016 at 2:59 PM, Douglas Fuller <dfuller@redhat.com> wrote:
>> 
>>> On Mar 2, 2016, at 6:26 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>> 
>> 
>>> I also spotted two memory leaks on the NOTIFY_COMPLETE path in
>>> __do_event().  The event one is trivial,
>> 
>> We cannot free the event here because CEPH_WATCH_EVENT_NOTIFY_COMPLETE fills values needed by the caller that may be waiting on it. The waiter has to call ceph_osdc_cancel_event when done with it.
> 
> My ceph_osdc_put_event() doesn't free the event, it just puts a ref
> that was acquired above with get_event().

That ref should be put by a call to ceph_osdc_cancel_event in that particular case. It looks like the event ref counting may need attention, though. I’ll look further.

>>> the page vector one I have
>>> a question about.  The data item is allocated in alloc_msg() and the
>>> actual buffer is then passed into __do_event() and eventually into
>>> rbd_send_async_notify(), but not further up the stack.  Is anything
>>> going to use it?  If not, we should remove it entirely.
>> 
>> The incoming message may contain the data item. In the case of CEPH_WATCH_EVENT_NOTIFY_COMPLETE, rbd does not expect any data from the OSD, only the return code. The protocol allows it to be present, though, so osd_client should handle this case.
> 
> If there won't be any users of it on the kernel client side, there's no
> point in allocating the data item, extracting that data and propagating
> it half way up, only to discard the data and free the data item.

alloc_msg will have to allocate and the data item anyway, because CEPH_MSG_WATCH_NOTIFY might contain one. We only know that krbd is not expecting one iff the notify opcode is CEPH_WATCH_EVENT_NOTIFY_COMPLETE, but we do not decode that until later.

We could do as you suggest in your patch and free the data item there. It should be empty anyway for these cases, so this will not save memory. We could shrink the ceph_osd_event struct by a pointer and a size_t, though.

>> Yes. I thought you promised not to look at the garbage/junk commits, where that line is.
> 
> I looked at the entire diff - it was blowing past 1G of memory, after
> all ;)

Your test was, or the diff was? ;)

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov March 2, 2016, 3:20 p.m. UTC | #4
On Wed, Mar 2, 2016 at 3:56 PM, Douglas Fuller <dfuller@redhat.com> wrote:
>
>> On Mar 2, 2016, at 9:10 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>
>> On Wed, Mar 2, 2016 at 2:59 PM, Douglas Fuller <dfuller@redhat.com> wrote:
>>>
>>>> On Mar 2, 2016, at 6:26 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>
>>>
>>>> I also spotted two memory leaks on the NOTIFY_COMPLETE path in
>>>> __do_event().  The event one is trivial,
>>>
>>> We cannot free the event here because CEPH_WATCH_EVENT_NOTIFY_COMPLETE fills values needed by the caller that may be waiting on it. The waiter has to call ceph_osdc_cancel_event when done with it.
>>
>> My ceph_osdc_put_event() doesn't free the event, it just puts a ref
>> that was acquired above with get_event().
>
> That ref should be put by a call to ceph_osdc_cancel_event in that particular case. It looks like the event ref counting may need attention, though. I’ll look further.

The final ref *is* put by a call to ceph_osdc_cancel_event().  This is
the second ref, acquired at the top of __do_event(), and it needs to be
put somewhere.  In NOTIFY/DISCONNECT cases it's put in do_event_work(),
in the COMPLETE case it needs to be put in __do_event().  Am I missing
something?

>
>>>> the page vector one I have
>>>> a question about.  The data item is allocated in alloc_msg() and the
>>>> actual buffer is then passed into __do_event() and eventually into
>>>> rbd_send_async_notify(), but not further up the stack.  Is anything
>>>> going to use it?  If not, we should remove it entirely.
>>>
>>> The incoming message may contain the data item. In the case of CEPH_WATCH_EVENT_NOTIFY_COMPLETE, rbd does not expect any data from the OSD, only the return code. The protocol allows it to be present, though, so osd_client should handle this case.
>>
>> If there won't be any users of it on the kernel client side, there's no
>> point in allocating the data item, extracting that data and propagating
>> it half way up, only to discard the data and free the data item.
>
> alloc_msg will have to allocate and the data item anyway, because CEPH_MSG_WATCH_NOTIFY might contain one. We only know that krbd is not expecting one iff the notify opcode is CEPH_WATCH_EVENT_NOTIFY_COMPLETE, but we do not decode that until later.

Who is going to be the user in the !NOTIFY_COMPLETE case?

>
> We could do as you suggest in your patch and free the data item there. It should be empty anyway for these cases, so this will not save memory. We could shrink the ceph_osd_event struct by a pointer and a size_t, though.

If you are saying we can drop notify_data and notify_data_len from the
struct, how would you propagate the data in the !NOTIFY_COMPLETE case?

Right now the whole thing is unused and I don't see any kind of switch
on the notify opcode in the code:

    alloc_msg():
        if type == CEPH_MSG_WATCH_NOTIFY:
            if data_len > 0:
                alloc page vector, create a data item, add it to the msg

    handle_watch_notify():
        pages = data_item->pages;
        len = data_item->length;
        pass pages and len into __do_event()

    __do_event():
        assign pages and len to notify.notify_data and notify.notify_data_len,
        both of which are unused

    destroy the message

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Douglas Fuller March 2, 2016, 3:33 p.m. UTC | #5
Sorry, I should have looked more closely at this. I forgot that the data comes inside the message when opcode == CEPH_WATCH_EVENT_NOTIFY and a data item when opcode == CEPH_WATCH_EVENT_NOTIFY_COMPLETE. Yes, the data item isn’t used by rbd.

>> alloc_msg will have to allocate and the data item anyway, because CEPH_MSG_WATCH_NOTIFY might contain one. We only know that krbd is not expecting one iff the notify opcode is CEPH_WATCH_EVENT_NOTIFY_COMPLETE, but we do not decode that until later.
> 
> Who is going to be the user in the !NOTIFY_COMPLETE case?
> 
>> 
>> We could do as you suggest in your patch and free the data item there. It should be empty anyway for these cases, so this will not save memory. We could shrink the ceph_osd_event struct by a pointer and a size_t, though.
> 
> If you are saying we can drop notify_data and notify_data_len from the
> struct, how would you propagate the data in the !NOTIFY_COMPLETE case?
> 
> Right now the whole thing is unused and I don't see any kind of switch
> on the notify opcode in the code:
> 
>    alloc_msg():
>        if type == CEPH_MSG_WATCH_NOTIFY:
>            if data_len > 0:
>                alloc page vector, create a data item, add it to the msg
> 
>    handle_watch_notify():
>        pages = data_item->pages;
>        len = data_item->length;
>        pass pages and len into __do_event()
> 
>    __do_event():
>        assign pages and len to notify.notify_data and notify.notify_data_len,
>        both of which are unused
> 
>    destroy the message
> 
> Thanks,
> 
>                Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 92c354256055..c0198b6ca605 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1930,14 +1930,14 @@  static void rbd_osd_delete_callback(struct rbd_obj_request *obj_request)
 	u8 current_state;
 
 	if (!obj_request->img_request) {
-		rbd_osd_complete_delete(obj_request);
+		rbd_osd_discard_callback(obj_request);
 		return;
 	}
 
 	rbd_dev = obj_request->img_request->rbd_dev;
 
 	if (!rbd_use_object_map(rbd_dev)) {
-		rbd_osd_complete_delete(obj_request);
+		rbd_osd_discard_callback(obj_request);
 		return;
 	}
 
@@ -3632,10 +3632,13 @@  static int rbd_send_async_notify(struct rbd_device *rbd_dev,
 	}
 
 	completed = ceph_osdc_wait_event(osdc, notify_event);
-	if (!completed)
+	if (!completed) {
 		ret = -ETIMEDOUT;
-	else
+	} else {
 		ret = notify_event->notify.return_code;
+		ceph_release_page_vector(notify_event->notify.notify_data,
+		    calc_pages_for(0, notify_event->notify.notify_data_len));
+	}
 
 cancel_event:
 	ceph_osdc_cancel_event(notify_event);
@@ -4828,7 +4831,6 @@  static int rbd_obj_delete_sync(struct rbd_device *rbd_dev,
 
 	//obj_request->osd_req->r_priv = obj_request;
 
-	ceph_get_snap_context(rbd_dev->header.snapc);
 	osd_req_op_init(obj_request->osd_req, 0, CEPH_OSD_OP_DELETE, 0);
 	rbd_osd_req_format_snap_write(obj_request, rbd_dev->header.snapc);
 
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 8316a304af63..12841c5a09c7 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2942,6 +2942,7 @@  static void __do_event(struct ceph_osd_client *osdc, u8 opcode,
 					event->osd_req = NULL;
 				}
 				complete_all(&event->notify.complete);
+				ceph_osdc_put_event(event);
 			}
 			break;
 		default: