diff mbox

[RFC] osd: Add local_connection to fast_dispatch in func _send_boot.

Message ID 6AA21C22F0A5DA478922644AD2EC308C887C60@SHSMSX101.ccr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ma, Jianpeng July 14, 2014, 3:17 a.m. UTC
When do ec-read, i met a bug which was occured 100%. The messages are:
2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function
'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700 time
2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session)

 ceph version 0.82-585-g79f3f67 (79f3f6749122ce2944baa70541949d7ca75525e6)
 1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6]
 2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6]
 3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b]
 4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd]
 5: (()+0x8182) [0x7f7665670182]
 6: (clone()+0x6d) [0x7f7663a1130d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

In commit 69fc6b2b66, it enable fast_dispatch on local connections and
it will add local_connection to fast_dispatch in func init_local_connection.
But if there is no fast-dispatch, the local connection can't add.

If there is no clutser addr in ceph.conf, it will add local_connection
to fast dispatch in func _send_boot because the cluster_addr is empty.
But if there is cluster addr, local_connection can't add to fast dispatch.

For ECSubRead, it send to itself by func send_message_osd_cluster so it
will cause this bug.

I don't know about hb_back/front_server_messenger. But they are in
_send_boot like cluster_messenger, so i also modified those.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
---
 src/osd/OSD.cc | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gregory Farnum July 16, 2014, 7:40 p.m. UTC | #1
I'm looking at this and getting a little confused. Can you provide a
log of the crash occurring? (preferably with debug_ms=20,
debug_osd=20)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Jul 13, 2014 at 8:17 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote:
> When do ec-read, i met a bug which was occured 100%. The messages are:
> 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function
> 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700 time
> 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session)
>
>  ceph version 0.82-585-g79f3f67 (79f3f6749122ce2944baa70541949d7ca75525e6)
>  1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6]
>  2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6]
>  3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b]
>  4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd]
>  5: (()+0x8182) [0x7f7665670182]
>  6: (clone()+0x6d) [0x7f7663a1130d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> In commit 69fc6b2b66, it enable fast_dispatch on local connections and
> it will add local_connection to fast_dispatch in func init_local_connection.
> But if there is no fast-dispatch, the local connection can't add.
>
> If there is no clutser addr in ceph.conf, it will add local_connection
> to fast dispatch in func _send_boot because the cluster_addr is empty.
> But if there is cluster addr, local_connection can't add to fast dispatch.
>
> For ECSubRead, it send to itself by func send_message_osd_cluster so it
> will cause this bug.
>
> I don't know about hb_back/front_server_messenger. But they are in
> _send_boot like cluster_messenger, so i also modified those.
>
> Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
> ---
>  src/osd/OSD.cc | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
> index 52a3839..75b294b 100644
> --- a/src/osd/OSD.cc
> +++ b/src/osd/OSD.cc
> @@ -3852,29 +3852,37 @@ void OSD::_send_boot()
>  {
>    dout(10) << "_send_boot" << dendl;
>    entity_addr_t cluster_addr = cluster_messenger->get_myaddr();
> +  Connection *local_connection = cluster_messenger->get_loopback_connection().get();
>    if (cluster_addr.is_blank_ip()) {
>      int port = cluster_addr.get_port();
>      cluster_addr = client_messenger->get_myaddr();
>      cluster_addr.set_port(port);
>      cluster_messenger->set_addr_unknowns(cluster_addr);
>      dout(10) << " assuming cluster_addr ip matches client_addr" << dendl;
> -  }
> +  } else if (local_connection->get_priv() == NULL)
> +      cluster_messenger->ms_deliver_handle_fast_connect(local_connection);
> +
>    entity_addr_t hb_back_addr = hb_back_server_messenger->get_myaddr();
> +  local_connection = hb_back_server_messenger->get_loopback_connection().get();
>    if (hb_back_addr.is_blank_ip()) {
>      int port = hb_back_addr.get_port();
>      hb_back_addr = cluster_addr;
>      hb_back_addr.set_port(port);
>      hb_back_server_messenger->set_addr_unknowns(hb_back_addr);
>      dout(10) << " assuming hb_back_addr ip matches cluster_addr" << dendl;
> -  }
> +  } else if (local_connection->get_priv() == NULL)
> +      hb_back_server_messenger->ms_deliver_handle_fast_connect(local_connection);
> +
>    entity_addr_t hb_front_addr = hb_front_server_messenger->get_myaddr();
> +  local_connection = hb_front_server_messenger->get_loopback_connection().get();
>    if (hb_front_addr.is_blank_ip()) {
>      int port = hb_front_addr.get_port();
>      hb_front_addr = client_messenger->get_myaddr();
>      hb_front_addr.set_port(port);
>      hb_front_server_messenger->set_addr_unknowns(hb_front_addr);
>      dout(10) << " assuming hb_front_addr ip matches client_addr" << dendl;
> -  }
> +  } else if (local_connection->get_priv() == NULL)
> +      hb_front_server_messenger->ms_deliver_handle_fast_connect(local_connection);
>
>    MOSDBoot *mboot = new MOSDBoot(superblock, service.get_boot_epoch(),
>                                   hb_back_addr, hb_front_addr, cluster_addr);
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Farnum July 18, 2014, 8:55 p.m. UTC | #2
Hrm, I'd really like to see the startup sequence. I see the crash
occurring, but I don't understand how it's happening — we test this
pretty extensively so there must be something about your testing
configuration that is different than ours. Can you provide that part
of the log, and maybe a little more description of what you think the
problem is?

In particular, we *always* call init_local_connection when the
messenger starts, so every messenger who is allowed to receive EC
messages should have the local connection set up before they get one.
I don't really see how supplying the local connection as a new one in
_send_boot *should* be fixing that, and it's not the place to do so
(although I guess it's doing *something*, I just can't figure out
what).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Jul 16, 2014 at 5:17 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote:
> Hi Greg,
>    The attachment is the log.
>
> Thanks!
>
> -----Original Message-----
> From: Gregory Farnum [mailto:greg@inktank.com]
> Sent: Thursday, July 17, 2014 3:41 AM
> To: Ma, Jianpeng
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: [RFC][PATCH] osd: Add local_connection to fast_dispatch in func _send_boot.
>
> I'm looking at this and getting a little confused. Can you provide a log of the crash occurring? (preferably with debug_ms=20,
> debug_osd=20)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Sun, Jul 13, 2014 at 8:17 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote:
>> When do ec-read, i met a bug which was occured 100%. The messages are:
>> 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function
>> 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700
>> time
>> 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session)
>>
>>  ceph version 0.82-585-g79f3f67
>> (79f3f6749122ce2944baa70541949d7ca75525e6)
>>  1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6]
>>  2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6]
>>  3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b]
>>  4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd]
>>  5: (()+0x8182) [0x7f7665670182]
>>  6: (clone()+0x6d) [0x7f7663a1130d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>> In commit 69fc6b2b66, it enable fast_dispatch on local connections and
>> it will add local_connection to fast_dispatch in func init_local_connection.
>> But if there is no fast-dispatch, the local connection can't add.
>>
>> If there is no clutser addr in ceph.conf, it will add local_connection
>> to fast dispatch in func _send_boot because the cluster_addr is empty.
>> But if there is cluster addr, local_connection can't add to fast dispatch.
>>
>> For ECSubRead, it send to itself by func send_message_osd_cluster so
>> it will cause this bug.
>>
>> I don't know about hb_back/front_server_messenger. But they are in
>> _send_boot like cluster_messenger, so i also modified those.
>>
>> Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
>> ---
>>  src/osd/OSD.cc | 14 +++++++++++---
>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 52a3839..75b294b
>> 100644
>> --- a/src/osd/OSD.cc
>> +++ b/src/osd/OSD.cc
>> @@ -3852,29 +3852,37 @@ void OSD::_send_boot()  {
>>    dout(10) << "_send_boot" << dendl;
>>    entity_addr_t cluster_addr = cluster_messenger->get_myaddr();
>> +  Connection *local_connection =
>> + cluster_messenger->get_loopback_connection().get();
>>    if (cluster_addr.is_blank_ip()) {
>>      int port = cluster_addr.get_port();
>>      cluster_addr = client_messenger->get_myaddr();
>>      cluster_addr.set_port(port);
>>      cluster_messenger->set_addr_unknowns(cluster_addr);
>>      dout(10) << " assuming cluster_addr ip matches client_addr" <<
>> dendl;
>> -  }
>> +  } else if (local_connection->get_priv() == NULL)
>> +
>> + cluster_messenger->ms_deliver_handle_fast_connect(local_connection);
>> +
>>    entity_addr_t hb_back_addr =
>> hb_back_server_messenger->get_myaddr();
>> +  local_connection =
>> + hb_back_server_messenger->get_loopback_connection().get();
>>    if (hb_back_addr.is_blank_ip()) {
>>      int port = hb_back_addr.get_port();
>>      hb_back_addr = cluster_addr;
>>      hb_back_addr.set_port(port);
>>      hb_back_server_messenger->set_addr_unknowns(hb_back_addr);
>>      dout(10) << " assuming hb_back_addr ip matches cluster_addr" <<
>> dendl;
>> -  }
>> +  } else if (local_connection->get_priv() == NULL)
>> +
>> + hb_back_server_messenger->ms_deliver_handle_fast_connect(local_conne
>> + ction);
>> +
>>    entity_addr_t hb_front_addr =
>> hb_front_server_messenger->get_myaddr();
>> +  local_connection =
>> + hb_front_server_messenger->get_loopback_connection().get();
>>    if (hb_front_addr.is_blank_ip()) {
>>      int port = hb_front_addr.get_port();
>>      hb_front_addr = client_messenger->get_myaddr();
>>      hb_front_addr.set_port(port);
>>      hb_front_server_messenger->set_addr_unknowns(hb_front_addr);
>>      dout(10) << " assuming hb_front_addr ip matches client_addr" <<
>> dendl;
>> -  }
>> +  } else if (local_connection->get_priv() == NULL)
>> +
>> + hb_front_server_messenger->ms_deliver_handle_fast_connect(local_conn
>> + ection);
>>
>>    MOSDBoot *mboot = new MOSDBoot(superblock, service.get_boot_epoch(),
>>                                   hb_back_addr, hb_front_addr,
>> cluster_addr);
>> --
>> 1.9.1
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ma, Jianpeng July 21, 2014, 6:33 a.m. UTC | #3
PiBIcm0sIEknZCByZWFsbHkgbGlrZSB0byBzZWUgdGhlIHN0YXJ0dXAgc2VxdWVuY2UuIEkgc2Vl
IHRoZSBjcmFzaCBvY2N1cnJpbmcsIGJ1dCBJDQo+IGRvbid0IHVuZGVyc3RhbmQgaG93IGl0J3Mg
aGFwcGVuaW5nIOKAlCB3ZSB0ZXN0IHRoaXMgcHJldHR5IGV4dGVuc2l2ZWx5IHNvIHRoZXJlDQo+
IG11c3QgYmUgc29tZXRoaW5nIGFib3V0IHlvdXIgdGVzdGluZyBjb25maWd1cmF0aW9uIHRoYXQg
aXMgZGlmZmVyZW50IHRoYW4gb3Vycy4NCj4gQ2FuIHlvdSBwcm92aWRlIHRoYXQgcGFydCBvZiB0
aGUgbG9nLCBhbmQgbWF5YmUgYSBsaXR0bGUgbW9yZSBkZXNjcmlwdGlvbiBvZg0KPiB3aGF0IHlv
dSB0aGluayB0aGUgcHJvYmxlbSBpcz8NCg0KSWYgdGhlIGNlcGguY29uZiBjb250YWluICJjbHVz
dGVyIGFkZHIiLCAgdGhlIGJ1ZyBtdXN0IG9jY3VyLg0KRm9yIG5vICJjbHVzdGVyIGFkZHIiIGlu
IGNlcGguY29uZiwgdGhlIGxvY2FsLWNvbm5lY3Rpb24gYWRkIHRvIGZhc3QtZGlzcGF0Y2ggaW4g
ZnVuYyAgX3NlbmRfYm9vdC8gY2x1c3Rlcl9tZXNzZW5nZXItPnNldF9hZGRyX3Vua25vd25zLg0K
DQo+IA0KPiBJbiBwYXJ0aWN1bGFyLCB3ZSAqYWx3YXlzKiBjYWxsIGluaXRfbG9jYWxfY29ubmVj
dGlvbiB3aGVuIHRoZSBtZXNzZW5nZXINCj4gc3RhcnRzLCBzbyBldmVyeSBtZXNzZW5nZXIgd2hv
IGlzIGFsbG93ZWQgdG8gcmVjZWl2ZSBFQyBtZXNzYWdlcyBzaG91bGQgaGF2ZQ0KPiB0aGUgbG9j
YWwgY29ubmVjdGlvbiBzZXQgdXAgYmVmb3JlIHRoZXkgZ2V0IG9uZS4NClllcyB5b3UgY2FsbCBp
bml0X2xvY2FsX2Nvbm5lY3Rpb24uIEJ1dCBvbmx5IGFkZGluZyBvc2QgdG8gbWVzc2VuZ2VyLCB0
aGUgbG9jYWxfY29uZW5jdGlvbiBjYW4gYWRkIHRvIGRpc3BhdGNoLg0KSW4gZnVuYyBPU0Q6Omlu
aXQNCgk+PmNsdXN0ZXJfbWVzc2VuZ2VyLT5hZGRfZGlzcGF0Y2hlcl9oZWFkKHRoaXMpOw0KT25s
eSBhZnRlciB0aGlzLCB0aGUgbG9jYWxfY29ubmVjdGlvbiBjYW4gYWRkIHRvIGRpc3BhdGNoLg0K
QmVjYXVzZSBpZiBsb2NhbF9jb25uZWN0aW9uIGhhcyBjb3JyZWN0IHR5cGUsIGl0IGNhbiBhZGQg
dG8gZGlzcGF0Y2ggYW5kIGRvbuKAmXQnIGNhcmUgdGhlIGNsdXN0ZXIgYWRkci4NCldoZW4gYWxs
b2NhdGUgYSBNZXNzZW5nZXIsIGl0IHNldCB0aGUgdHlwZSBhbmQgb25seSBhZnRlciBhZGRfZGlz
cGF0Y2hlcl9oZWFkL3RhaWwsIHRoZSBsb2NhbC1jb25uZWN0aW9uIGNhbiBhZGQgdG8gZGlzcGF0
Y2guDQpNYXliZSBhZGQgbXNfZGVsaXZlcl9oYW5kbGVfZmFzdF9jb25uZWN0KGxvY2FsX2Nvbm5l
Y3Rpb24uZ2V0KCkpICBpbiBTaW1wbGVNZXNzZW5nZXI6OnJlYWR5IGlzIGJldHRlci4NCg0KDQpK
aWFucGVuZyBNYQ0KDQo+IEkgZG9uJ3QgcmVhbGx5IHNlZSBob3cgc3VwcGx5aW5nIHRoZSBsb2Nh
bCBjb25uZWN0aW9uIGFzIGEgbmV3IG9uZSBpbg0KPiBfc2VuZF9ib290ICpzaG91bGQqIGJlIGZp
eGluZyB0aGF0LCBhbmQgaXQncyBub3QgdGhlIHBsYWNlIHRvIGRvIHNvIChhbHRob3VnaCBJDQo+
IGd1ZXNzIGl0J3MgZG9pbmcgKnNvbWV0aGluZyosIEkganVzdCBjYW4ndCBmaWd1cmUgb3V0IHdo
YXQpLg0KDQoNCj4gLUdyZWcNCj4gU29mdHdhcmUgRW5naW5lZXIgIzQyIEAgaHR0cDovL2lua3Rh
bmsuY29tIHwgaHR0cDovL2NlcGguY29tDQo+IA0KPiANCj4gT24gV2VkLCBKdWwgMTYsIDIwMTQg
YXQgNToxNyBQTSwgTWEsIEppYW5wZW5nIDxqaWFucGVuZy5tYUBpbnRlbC5jb20+DQo+IHdyb3Rl
Og0KPiA+IEhpIEdyZWcsDQo+ID4gICAgVGhlIGF0dGFjaG1lbnQgaXMgdGhlIGxvZy4NCj4gPg0K
PiA+IFRoYW5rcyENCj4gPg0KPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gRnJv
bTogR3JlZ29yeSBGYXJudW0gW21haWx0bzpncmVnQGlua3RhbmsuY29tXQ0KPiA+IFNlbnQ6IFRo
dXJzZGF5LCBKdWx5IDE3LCAyMDE0IDM6NDEgQU0NCj4gPiBUbzogTWEsIEppYW5wZW5nDQo+ID4g
Q2M6IGNlcGgtZGV2ZWxAdmdlci5rZXJuZWwub3JnDQo+ID4gU3ViamVjdDogUmU6IFtSRkNdW1BB
VENIXSBvc2Q6IEFkZCBsb2NhbF9jb25uZWN0aW9uIHRvIGZhc3RfZGlzcGF0Y2ggaW4gZnVuYw0K
PiBfc2VuZF9ib290Lg0KPiA+DQo+ID4gSSdtIGxvb2tpbmcgYXQgdGhpcyBhbmQgZ2V0dGluZyBh
IGxpdHRsZSBjb25mdXNlZC4gQ2FuIHlvdSBwcm92aWRlIGENCj4gPiBsb2cgb2YgdGhlIGNyYXNo
IG9jY3VycmluZz8gKHByZWZlcmFibHkgd2l0aCBkZWJ1Z19tcz0yMCwNCj4gPiBkZWJ1Z19vc2Q9
MjApDQo+ID4gLUdyZWcNCj4gPiBTb2Z0d2FyZSBFbmdpbmVlciAjNDIgQCBodHRwOi8vaW5rdGFu
ay5jb20gfCBodHRwOi8vY2VwaC5jb20NCj4gPg0KPiA+DQo+ID4gT24gU3VuLCBKdWwgMTMsIDIw
MTQgYXQgODoxNyBQTSwgTWEsIEppYW5wZW5nIDxqaWFucGVuZy5tYUBpbnRlbC5jb20+DQo+IHdy
b3RlOg0KPiA+PiBXaGVuIGRvIGVjLXJlYWQsIGkgbWV0IGEgYnVnIHdoaWNoIHdhcyBvY2N1cmVk
IDEwMCUuIFRoZSBtZXNzYWdlcyBhcmU6DQo+ID4+IDIwMTQtMDctMTQgMTA6MDM6MDcuMzE4Njgx
IDdmNzY1NGY2ZTcwMCAtMSBvc2QvT1NELmNjOiBJbiBmdW5jdGlvbg0KPiA+PiAndmlydHVhbCB2
b2lkIE9TRDo6bXNfZmFzdF9kaXNwYXRjaChNZXNzYWdlKiknIHRocmVhZCA3Zjc2NTRmNmU3MDAN
Cj4gPj4gdGltZQ0KPiA+PiAyMDE0LTA3LTE0IDEwOjAzOjA3LjMxNjc4MiBvc2QvT1NELmNjOiA1
MDE5OiBGQUlMRUQgYXNzZXJ0KHNlc3Npb24pDQo+ID4+DQo+ID4+ICBjZXBoIHZlcnNpb24gMC44
Mi01ODUtZzc5ZjNmNjcNCj4gPj4gKDc5ZjNmNjc0OTEyMmNlMjk0NGJhYTcwNTQxOTQ5ZDdjYTc1
NTI1ZTYpDQo+ID4+ICAxOiAoT1NEOjptc19mYXN0X2Rpc3BhdGNoKE1lc3NhZ2UqKSsweDI4Nikg
WzB4NjU0NGI2XQ0KPiA+PiAgMjogKERpc3BhdGNoUXVldWU6OmZhc3RfZGlzcGF0Y2goTWVzc2Fn
ZSopKzB4NTYpIFsweGIwNTlkNl0NCj4gPj4gIDM6IChEaXNwYXRjaFF1ZXVlOjpydW5fbG9jYWxf
ZGVsaXZlcnkoKSsweDZiKSBbMHhiMDhlMGJdDQo+ID4+ICA0OiAoRGlzcGF0Y2hRdWV1ZTo6TG9j
YWxEZWxpdmVyeVRocmVhZDo6ZW50cnkoKSsweGQpIFsweGE0YTVmZF0NCj4gPj4gIDU6ICgoKSsw
eDgxODIpIFsweDdmNzY2NTY3MDE4Ml0NCj4gPj4gIDY6IChjbG9uZSgpKzB4NmQpIFsweDdmNzY2
M2ExMTMwZF0NCj4gPj4gIE5PVEU6IGEgY29weSBvZiB0aGUgZXhlY3V0YWJsZSwgb3IgYG9iamR1
bXAgLXJkUyA8ZXhlY3V0YWJsZT5gIGlzIG5lZWRlZA0KPiB0byBpbnRlcnByZXQgdGhpcy4NCj4g
Pj4NCj4gPj4gSW4gY29tbWl0IDY5ZmM2YjJiNjYsIGl0IGVuYWJsZSBmYXN0X2Rpc3BhdGNoIG9u
IGxvY2FsIGNvbm5lY3Rpb25zDQo+ID4+IGFuZCBpdCB3aWxsIGFkZCBsb2NhbF9jb25uZWN0aW9u
IHRvIGZhc3RfZGlzcGF0Y2ggaW4gZnVuYw0KPiBpbml0X2xvY2FsX2Nvbm5lY3Rpb24uDQo+ID4+
IEJ1dCBpZiB0aGVyZSBpcyBubyBmYXN0LWRpc3BhdGNoLCB0aGUgbG9jYWwgY29ubmVjdGlvbiBj
YW4ndCBhZGQuDQo+ID4+DQo+ID4+IElmIHRoZXJlIGlzIG5vIGNsdXRzZXIgYWRkciBpbiBjZXBo
LmNvbmYsIGl0IHdpbGwgYWRkDQo+ID4+IGxvY2FsX2Nvbm5lY3Rpb24gdG8gZmFzdCBkaXNwYXRj
aCBpbiBmdW5jIF9zZW5kX2Jvb3QgYmVjYXVzZSB0aGUNCj4gY2x1c3Rlcl9hZGRyIGlzIGVtcHR5
Lg0KPiA+PiBCdXQgaWYgdGhlcmUgaXMgY2x1c3RlciBhZGRyLCBsb2NhbF9jb25uZWN0aW9uIGNh
bid0IGFkZCB0byBmYXN0IGRpc3BhdGNoLg0KPiA+Pg0KPiA+PiBGb3IgRUNTdWJSZWFkLCBpdCBz
ZW5kIHRvIGl0c2VsZiBieSBmdW5jIHNlbmRfbWVzc2FnZV9vc2RfY2x1c3RlciBzbw0KPiA+PiBp
dCB3aWxsIGNhdXNlIHRoaXMgYnVnLg0KPiA+Pg0KPiA+PiBJIGRvbid0IGtub3cgYWJvdXQgaGJf
YmFjay9mcm9udF9zZXJ2ZXJfbWVzc2VuZ2VyLiBCdXQgdGhleSBhcmUgaW4NCj4gPj4gX3NlbmRf
Ym9vdCBsaWtlIGNsdXN0ZXJfbWVzc2VuZ2VyLCBzbyBpIGFsc28gbW9kaWZpZWQgdGhvc2UuDQo+
ID4+DQo+ID4+IFNpZ25lZC1vZmYtYnk6IE1hIEppYW5wZW5nIDxqaWFucGVuZy5tYUBpbnRlbC5j
b20+DQo+ID4+IC0tLQ0KPiA+PiAgc3JjL29zZC9PU0QuY2MgfCAxNCArKysrKysrKysrKy0tLQ0K
PiA+PiAgMSBmaWxlIGNoYW5nZWQsIDExIGluc2VydGlvbnMoKyksIDMgZGVsZXRpb25zKC0pDQo+
ID4+DQo+ID4+IGRpZmYgLS1naXQgYS9zcmMvb3NkL09TRC5jYyBiL3NyYy9vc2QvT1NELmNjIGlu
ZGV4IDUyYTM4MzkuLjc1YjI5NGINCj4gPj4gMTAwNjQ0DQo+ID4+IC0tLSBhL3NyYy9vc2QvT1NE
LmNjDQo+ID4+ICsrKyBiL3NyYy9vc2QvT1NELmNjDQo+ID4+IEBAIC0zODUyLDI5ICszODUyLDM3
IEBAIHZvaWQgT1NEOjpfc2VuZF9ib290KCkgIHsNCj4gPj4gICAgZG91dCgxMCkgPDwgIl9zZW5k
X2Jvb3QiIDw8IGRlbmRsOw0KPiA+PiAgICBlbnRpdHlfYWRkcl90IGNsdXN0ZXJfYWRkciA9IGNs
dXN0ZXJfbWVzc2VuZ2VyLT5nZXRfbXlhZGRyKCk7DQo+ID4+ICsgIENvbm5lY3Rpb24gKmxvY2Fs
X2Nvbm5lY3Rpb24gPQ0KPiA+PiArIGNsdXN0ZXJfbWVzc2VuZ2VyLT5nZXRfbG9vcGJhY2tfY29u
bmVjdGlvbigpLmdldCgpOw0KPiA+PiAgICBpZiAoY2x1c3Rlcl9hZGRyLmlzX2JsYW5rX2lwKCkp
IHsNCj4gPj4gICAgICBpbnQgcG9ydCA9IGNsdXN0ZXJfYWRkci5nZXRfcG9ydCgpOw0KPiA+PiAg
ICAgIGNsdXN0ZXJfYWRkciA9IGNsaWVudF9tZXNzZW5nZXItPmdldF9teWFkZHIoKTsNCj4gPj4g
ICAgICBjbHVzdGVyX2FkZHIuc2V0X3BvcnQocG9ydCk7DQo+ID4+ICAgICAgY2x1c3Rlcl9tZXNz
ZW5nZXItPnNldF9hZGRyX3Vua25vd25zKGNsdXN0ZXJfYWRkcik7DQo+ID4+ICAgICAgZG91dCgx
MCkgPDwgIiBhc3N1bWluZyBjbHVzdGVyX2FkZHIgaXAgbWF0Y2hlcyBjbGllbnRfYWRkciIgPDwN
Cj4gPj4gZGVuZGw7DQo+ID4+IC0gIH0NCj4gPj4gKyAgfSBlbHNlIGlmIChsb2NhbF9jb25uZWN0
aW9uLT5nZXRfcHJpdigpID09IE5VTEwpDQo+ID4+ICsNCj4gPj4gKyBjbHVzdGVyX21lc3Nlbmdl
ci0+bXNfZGVsaXZlcl9oYW5kbGVfZmFzdF9jb25uZWN0KGxvY2FsX2Nvbm5lY3Rpb24pDQo+ID4+
ICsgOw0KPiA+PiArDQo+ID4+ICAgIGVudGl0eV9hZGRyX3QgaGJfYmFja19hZGRyID0NCj4gPj4g
aGJfYmFja19zZXJ2ZXJfbWVzc2VuZ2VyLT5nZXRfbXlhZGRyKCk7DQo+ID4+ICsgIGxvY2FsX2Nv
bm5lY3Rpb24gPQ0KPiA+PiArIGhiX2JhY2tfc2VydmVyX21lc3Nlbmdlci0+Z2V0X2xvb3BiYWNr
X2Nvbm5lY3Rpb24oKS5nZXQoKTsNCj4gPj4gICAgaWYgKGhiX2JhY2tfYWRkci5pc19ibGFua19p
cCgpKSB7DQo+ID4+ICAgICAgaW50IHBvcnQgPSBoYl9iYWNrX2FkZHIuZ2V0X3BvcnQoKTsNCj4g
Pj4gICAgICBoYl9iYWNrX2FkZHIgPSBjbHVzdGVyX2FkZHI7DQo+ID4+ICAgICAgaGJfYmFja19h
ZGRyLnNldF9wb3J0KHBvcnQpOw0KPiA+PiAgICAgIGhiX2JhY2tfc2VydmVyX21lc3Nlbmdlci0+
c2V0X2FkZHJfdW5rbm93bnMoaGJfYmFja19hZGRyKTsNCj4gPj4gICAgICBkb3V0KDEwKSA8PCAi
IGFzc3VtaW5nIGhiX2JhY2tfYWRkciBpcCBtYXRjaGVzIGNsdXN0ZXJfYWRkciIgPDwNCj4gPj4g
ZGVuZGw7DQo+ID4+IC0gIH0NCj4gPj4gKyAgfSBlbHNlIGlmIChsb2NhbF9jb25uZWN0aW9uLT5n
ZXRfcHJpdigpID09IE5VTEwpDQo+ID4+ICsNCj4gPj4gKw0KPiBoYl9iYWNrX3NlcnZlcl9tZXNz
ZW5nZXItPm1zX2RlbGl2ZXJfaGFuZGxlX2Zhc3RfY29ubmVjdChsb2NhbF9jb25uDQo+ID4+ICsg
ZQ0KPiA+PiArIGN0aW9uKTsNCj4gPj4gKw0KPiA+PiAgICBlbnRpdHlfYWRkcl90IGhiX2Zyb250
X2FkZHIgPQ0KPiA+PiBoYl9mcm9udF9zZXJ2ZXJfbWVzc2VuZ2VyLT5nZXRfbXlhZGRyKCk7DQo+
ID4+ICsgIGxvY2FsX2Nvbm5lY3Rpb24gPQ0KPiA+PiArIGhiX2Zyb250X3NlcnZlcl9tZXNzZW5n
ZXItPmdldF9sb29wYmFja19jb25uZWN0aW9uKCkuZ2V0KCk7DQo+ID4+ICAgIGlmIChoYl9mcm9u
dF9hZGRyLmlzX2JsYW5rX2lwKCkpIHsNCj4gPj4gICAgICBpbnQgcG9ydCA9IGhiX2Zyb250X2Fk
ZHIuZ2V0X3BvcnQoKTsNCj4gPj4gICAgICBoYl9mcm9udF9hZGRyID0gY2xpZW50X21lc3Nlbmdl
ci0+Z2V0X215YWRkcigpOw0KPiA+PiAgICAgIGhiX2Zyb250X2FkZHIuc2V0X3BvcnQocG9ydCk7
DQo+ID4+ICAgICAgaGJfZnJvbnRfc2VydmVyX21lc3Nlbmdlci0+c2V0X2FkZHJfdW5rbm93bnMo
aGJfZnJvbnRfYWRkcik7DQo+ID4+ICAgICAgZG91dCgxMCkgPDwgIiBhc3N1bWluZyBoYl9mcm9u
dF9hZGRyIGlwIG1hdGNoZXMgY2xpZW50X2FkZHIiIDw8DQo+ID4+IGRlbmRsOw0KPiA+PiAtICB9
DQo+ID4+ICsgIH0gZWxzZSBpZiAobG9jYWxfY29ubmVjdGlvbi0+Z2V0X3ByaXYoKSA9PSBOVUxM
KQ0KPiA+PiArDQo+ID4+ICsgaGJfZnJvbnRfc2VydmVyX21lc3Nlbmdlci0+bXNfZGVsaXZlcl9o
YW5kbGVfZmFzdF9jb25uZWN0KGxvY2FsX2Nvbg0KPiA+PiArIG4NCj4gPj4gKyBlY3Rpb24pOw0K
PiA+Pg0KPiA+PiAgICBNT1NEQm9vdCAqbWJvb3QgPSBuZXcgTU9TREJvb3Qoc3VwZXJibG9jaywN
Cj4gc2VydmljZS5nZXRfYm9vdF9lcG9jaCgpLA0KPiA+PiAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgaGJfYmFja19hZGRyLCBoYl9mcm9udF9hZGRyLA0KPiA+PiBjbHVzdGVyX2Fk
ZHIpOw0KPiA+PiAtLQ0KPiA+PiAxLjkuMQ0KPiA+Pg0K
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Farnum July 21, 2014, 8:54 p.m. UTC | #4
On Sun, Jul 20, 2014 at 11:33 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote:
>> Hrm, I'd really like to see the startup sequence. I see the crash occurring, but I
>> don't understand how it's happening — we test this pretty extensively so there
>> must be something about your testing configuration that is different than ours.
>> Can you provide that part of the log, and maybe a little more description of
>> what you think the problem is?
>
> If the ceph.conf contain "cluster addr",  the bug must occur.
> For no "cluster addr" in ceph.conf, the local-connection add to fast-dispatch in func  _send_boot/ cluster_messenger->set_addr_unknowns.
>
>>
>> In particular, we *always* call init_local_connection when the messenger
>> starts, so every messenger who is allowed to receive EC messages should have
>> the local connection set up before they get one.
> Yes you call init_local_connection. But only adding osd to messenger, the local_conenction can add to dispatch.
> In func OSD::init
>         >>cluster_messenger->add_dispatcher_head(this);
> Only after this, the local_connection can add to dispatch.
> Because if local_connection has correct type, it can add to dispatch and don’t' care the cluster addr.
> When allocate a Messenger, it set the type and only after add_dispatcher_head/tail, the local-connection can add to dispatch.
> Maybe add ms_deliver_handle_fast_connect(local_connection.get())  in SimpleMessenger::ready is better.

Ooooookay, I see the problem now. I pulled the patch (with some
wording changes) into master at commit
9061988ec7eaa922e2b303d9eece86e7c8ee0fa1. I've also created a ticket
to clean up the local dispatch Connection setup at
http://tracker.ceph.com/issues/8892.
Thanks!
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 52a3839..75b294b 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -3852,29 +3852,37 @@  void OSD::_send_boot()
 {
   dout(10) << "_send_boot" << dendl;
   entity_addr_t cluster_addr = cluster_messenger->get_myaddr();
+  Connection *local_connection = cluster_messenger->get_loopback_connection().get();
   if (cluster_addr.is_blank_ip()) {
     int port = cluster_addr.get_port();
     cluster_addr = client_messenger->get_myaddr();
     cluster_addr.set_port(port);
     cluster_messenger->set_addr_unknowns(cluster_addr);
     dout(10) << " assuming cluster_addr ip matches client_addr" << dendl;
-  }
+  } else if (local_connection->get_priv() == NULL)
+      cluster_messenger->ms_deliver_handle_fast_connect(local_connection);
+
   entity_addr_t hb_back_addr = hb_back_server_messenger->get_myaddr();
+  local_connection = hb_back_server_messenger->get_loopback_connection().get();
   if (hb_back_addr.is_blank_ip()) {
     int port = hb_back_addr.get_port();
     hb_back_addr = cluster_addr;
     hb_back_addr.set_port(port);
     hb_back_server_messenger->set_addr_unknowns(hb_back_addr);
     dout(10) << " assuming hb_back_addr ip matches cluster_addr" << dendl;
-  }
+  } else if (local_connection->get_priv() == NULL)
+      hb_back_server_messenger->ms_deliver_handle_fast_connect(local_connection);
+
   entity_addr_t hb_front_addr = hb_front_server_messenger->get_myaddr();
+  local_connection = hb_front_server_messenger->get_loopback_connection().get();
   if (hb_front_addr.is_blank_ip()) {
     int port = hb_front_addr.get_port();
     hb_front_addr = client_messenger->get_myaddr();
     hb_front_addr.set_port(port);
     hb_front_server_messenger->set_addr_unknowns(hb_front_addr);
     dout(10) << " assuming hb_front_addr ip matches client_addr" << dendl;
-  }
+  } else if (local_connection->get_priv() == NULL)
+      hb_front_server_messenger->ms_deliver_handle_fast_connect(local_connection);

   MOSDBoot *mboot = new MOSDBoot(superblock, service.get_boot_epoch(),
                                  hb_back_addr, hb_front_addr, cluster_addr);