Message ID | 6AA21C22F0A5DA478922644AD2EC308C887C60@SHSMSX101.ccr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
I'm looking at this and getting a little confused. Can you provide a log of the crash occurring? (preferably with debug_ms=20, debug_osd=20) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sun, Jul 13, 2014 at 8:17 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote: > When do ec-read, i met a bug which was occured 100%. The messages are: > 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function > 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700 time > 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session) > > ceph version 0.82-585-g79f3f67 (79f3f6749122ce2944baa70541949d7ca75525e6) > 1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6] > 2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6] > 3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b] > 4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd] > 5: (()+0x8182) [0x7f7665670182] > 6: (clone()+0x6d) [0x7f7663a1130d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > In commit 69fc6b2b66, it enable fast_dispatch on local connections and > it will add local_connection to fast_dispatch in func init_local_connection. > But if there is no fast-dispatch, the local connection can't add. > > If there is no clutser addr in ceph.conf, it will add local_connection > to fast dispatch in func _send_boot because the cluster_addr is empty. > But if there is cluster addr, local_connection can't add to fast dispatch. > > For ECSubRead, it send to itself by func send_message_osd_cluster so it > will cause this bug. > > I don't know about hb_back/front_server_messenger. But they are in > _send_boot like cluster_messenger, so i also modified those. > > Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com> > --- > src/osd/OSD.cc | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc > index 52a3839..75b294b 100644 > --- a/src/osd/OSD.cc > +++ b/src/osd/OSD.cc > @@ -3852,29 +3852,37 @@ void OSD::_send_boot() > { > dout(10) << "_send_boot" << dendl; > entity_addr_t cluster_addr = cluster_messenger->get_myaddr(); > + Connection *local_connection = cluster_messenger->get_loopback_connection().get(); > if (cluster_addr.is_blank_ip()) { > int port = cluster_addr.get_port(); > cluster_addr = client_messenger->get_myaddr(); > cluster_addr.set_port(port); > cluster_messenger->set_addr_unknowns(cluster_addr); > dout(10) << " assuming cluster_addr ip matches client_addr" << dendl; > - } > + } else if (local_connection->get_priv() == NULL) > + cluster_messenger->ms_deliver_handle_fast_connect(local_connection); > + > entity_addr_t hb_back_addr = hb_back_server_messenger->get_myaddr(); > + local_connection = hb_back_server_messenger->get_loopback_connection().get(); > if (hb_back_addr.is_blank_ip()) { > int port = hb_back_addr.get_port(); > hb_back_addr = cluster_addr; > hb_back_addr.set_port(port); > hb_back_server_messenger->set_addr_unknowns(hb_back_addr); > dout(10) << " assuming hb_back_addr ip matches cluster_addr" << dendl; > - } > + } else if (local_connection->get_priv() == NULL) > + hb_back_server_messenger->ms_deliver_handle_fast_connect(local_connection); > + > entity_addr_t hb_front_addr = hb_front_server_messenger->get_myaddr(); > + local_connection = hb_front_server_messenger->get_loopback_connection().get(); > if (hb_front_addr.is_blank_ip()) { > int port = hb_front_addr.get_port(); > hb_front_addr = client_messenger->get_myaddr(); > hb_front_addr.set_port(port); > hb_front_server_messenger->set_addr_unknowns(hb_front_addr); > dout(10) << " assuming hb_front_addr ip matches client_addr" << dendl; > - } > + } else if (local_connection->get_priv() == NULL) > + hb_front_server_messenger->ms_deliver_handle_fast_connect(local_connection); > > MOSDBoot *mboot = new MOSDBoot(superblock, service.get_boot_epoch(), > hb_back_addr, hb_front_addr, cluster_addr); > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hrm, I'd really like to see the startup sequence. I see the crash occurring, but I don't understand how it's happening — we test this pretty extensively so there must be something about your testing configuration that is different than ours. Can you provide that part of the log, and maybe a little more description of what you think the problem is? In particular, we *always* call init_local_connection when the messenger starts, so every messenger who is allowed to receive EC messages should have the local connection set up before they get one. I don't really see how supplying the local connection as a new one in _send_boot *should* be fixing that, and it's not the place to do so (although I guess it's doing *something*, I just can't figure out what). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Jul 16, 2014 at 5:17 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote: > Hi Greg, > The attachment is the log. > > Thanks! > > -----Original Message----- > From: Gregory Farnum [mailto:greg@inktank.com] > Sent: Thursday, July 17, 2014 3:41 AM > To: Ma, Jianpeng > Cc: ceph-devel@vger.kernel.org > Subject: Re: [RFC][PATCH] osd: Add local_connection to fast_dispatch in func _send_boot. > > I'm looking at this and getting a little confused. Can you provide a log of the crash occurring? (preferably with debug_ms=20, > debug_osd=20) > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Sun, Jul 13, 2014 at 8:17 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote: >> When do ec-read, i met a bug which was occured 100%. The messages are: >> 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function >> 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700 >> time >> 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session) >> >> ceph version 0.82-585-g79f3f67 >> (79f3f6749122ce2944baa70541949d7ca75525e6) >> 1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6] >> 2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6] >> 3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b] >> 4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd] >> 5: (()+0x8182) [0x7f7665670182] >> 6: (clone()+0x6d) [0x7f7663a1130d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. >> >> In commit 69fc6b2b66, it enable fast_dispatch on local connections and >> it will add local_connection to fast_dispatch in func init_local_connection. >> But if there is no fast-dispatch, the local connection can't add. >> >> If there is no clutser addr in ceph.conf, it will add local_connection >> to fast dispatch in func _send_boot because the cluster_addr is empty. >> But if there is cluster addr, local_connection can't add to fast dispatch. >> >> For ECSubRead, it send to itself by func send_message_osd_cluster so >> it will cause this bug. >> >> I don't know about hb_back/front_server_messenger. But they are in >> _send_boot like cluster_messenger, so i also modified those. >> >> Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com> >> --- >> src/osd/OSD.cc | 14 +++++++++++--- >> 1 file changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 52a3839..75b294b >> 100644 >> --- a/src/osd/OSD.cc >> +++ b/src/osd/OSD.cc >> @@ -3852,29 +3852,37 @@ void OSD::_send_boot() { >> dout(10) << "_send_boot" << dendl; >> entity_addr_t cluster_addr = cluster_messenger->get_myaddr(); >> + Connection *local_connection = >> + cluster_messenger->get_loopback_connection().get(); >> if (cluster_addr.is_blank_ip()) { >> int port = cluster_addr.get_port(); >> cluster_addr = client_messenger->get_myaddr(); >> cluster_addr.set_port(port); >> cluster_messenger->set_addr_unknowns(cluster_addr); >> dout(10) << " assuming cluster_addr ip matches client_addr" << >> dendl; >> - } >> + } else if (local_connection->get_priv() == NULL) >> + >> + cluster_messenger->ms_deliver_handle_fast_connect(local_connection); >> + >> entity_addr_t hb_back_addr = >> hb_back_server_messenger->get_myaddr(); >> + local_connection = >> + hb_back_server_messenger->get_loopback_connection().get(); >> if (hb_back_addr.is_blank_ip()) { >> int port = hb_back_addr.get_port(); >> hb_back_addr = cluster_addr; >> hb_back_addr.set_port(port); >> hb_back_server_messenger->set_addr_unknowns(hb_back_addr); >> dout(10) << " assuming hb_back_addr ip matches cluster_addr" << >> dendl; >> - } >> + } else if (local_connection->get_priv() == NULL) >> + >> + hb_back_server_messenger->ms_deliver_handle_fast_connect(local_conne >> + ction); >> + >> entity_addr_t hb_front_addr = >> hb_front_server_messenger->get_myaddr(); >> + local_connection = >> + hb_front_server_messenger->get_loopback_connection().get(); >> if (hb_front_addr.is_blank_ip()) { >> int port = hb_front_addr.get_port(); >> hb_front_addr = client_messenger->get_myaddr(); >> hb_front_addr.set_port(port); >> hb_front_server_messenger->set_addr_unknowns(hb_front_addr); >> dout(10) << " assuming hb_front_addr ip matches client_addr" << >> dendl; >> - } >> + } else if (local_connection->get_priv() == NULL) >> + >> + hb_front_server_messenger->ms_deliver_handle_fast_connect(local_conn >> + ection); >> >> MOSDBoot *mboot = new MOSDBoot(superblock, service.get_boot_epoch(), >> hb_back_addr, hb_front_addr, >> cluster_addr); >> -- >> 1.9.1 >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PiBIcm0sIEknZCByZWFsbHkgbGlrZSB0byBzZWUgdGhlIHN0YXJ0dXAgc2VxdWVuY2UuIEkgc2Vl IHRoZSBjcmFzaCBvY2N1cnJpbmcsIGJ1dCBJDQo+IGRvbid0IHVuZGVyc3RhbmQgaG93IGl0J3Mg aGFwcGVuaW5nIOKAlCB3ZSB0ZXN0IHRoaXMgcHJldHR5IGV4dGVuc2l2ZWx5IHNvIHRoZXJlDQo+ IG11c3QgYmUgc29tZXRoaW5nIGFib3V0IHlvdXIgdGVzdGluZyBjb25maWd1cmF0aW9uIHRoYXQg aXMgZGlmZmVyZW50IHRoYW4gb3Vycy4NCj4gQ2FuIHlvdSBwcm92aWRlIHRoYXQgcGFydCBvZiB0 aGUgbG9nLCBhbmQgbWF5YmUgYSBsaXR0bGUgbW9yZSBkZXNjcmlwdGlvbiBvZg0KPiB3aGF0IHlv dSB0aGluayB0aGUgcHJvYmxlbSBpcz8NCg0KSWYgdGhlIGNlcGguY29uZiBjb250YWluICJjbHVz dGVyIGFkZHIiLCAgdGhlIGJ1ZyBtdXN0IG9jY3VyLg0KRm9yIG5vICJjbHVzdGVyIGFkZHIiIGlu IGNlcGguY29uZiwgdGhlIGxvY2FsLWNvbm5lY3Rpb24gYWRkIHRvIGZhc3QtZGlzcGF0Y2ggaW4g ZnVuYyAgX3NlbmRfYm9vdC8gY2x1c3Rlcl9tZXNzZW5nZXItPnNldF9hZGRyX3Vua25vd25zLg0K DQo+IA0KPiBJbiBwYXJ0aWN1bGFyLCB3ZSAqYWx3YXlzKiBjYWxsIGluaXRfbG9jYWxfY29ubmVj dGlvbiB3aGVuIHRoZSBtZXNzZW5nZXINCj4gc3RhcnRzLCBzbyBldmVyeSBtZXNzZW5nZXIgd2hv IGlzIGFsbG93ZWQgdG8gcmVjZWl2ZSBFQyBtZXNzYWdlcyBzaG91bGQgaGF2ZQ0KPiB0aGUgbG9j YWwgY29ubmVjdGlvbiBzZXQgdXAgYmVmb3JlIHRoZXkgZ2V0IG9uZS4NClllcyB5b3UgY2FsbCBp bml0X2xvY2FsX2Nvbm5lY3Rpb24uIEJ1dCBvbmx5IGFkZGluZyBvc2QgdG8gbWVzc2VuZ2VyLCB0 aGUgbG9jYWxfY29uZW5jdGlvbiBjYW4gYWRkIHRvIGRpc3BhdGNoLg0KSW4gZnVuYyBPU0Q6Omlu aXQNCgk+PmNsdXN0ZXJfbWVzc2VuZ2VyLT5hZGRfZGlzcGF0Y2hlcl9oZWFkKHRoaXMpOw0KT25s eSBhZnRlciB0aGlzLCB0aGUgbG9jYWxfY29ubmVjdGlvbiBjYW4gYWRkIHRvIGRpc3BhdGNoLg0K QmVjYXVzZSBpZiBsb2NhbF9jb25uZWN0aW9uIGhhcyBjb3JyZWN0IHR5cGUsIGl0IGNhbiBhZGQg dG8gZGlzcGF0Y2ggYW5kIGRvbuKAmXQnIGNhcmUgdGhlIGNsdXN0ZXIgYWRkci4NCldoZW4gYWxs b2NhdGUgYSBNZXNzZW5nZXIsIGl0IHNldCB0aGUgdHlwZSBhbmQgb25seSBhZnRlciBhZGRfZGlz cGF0Y2hlcl9oZWFkL3RhaWwsIHRoZSBsb2NhbC1jb25uZWN0aW9uIGNhbiBhZGQgdG8gZGlzcGF0 Y2guDQpNYXliZSBhZGQgbXNfZGVsaXZlcl9oYW5kbGVfZmFzdF9jb25uZWN0KGxvY2FsX2Nvbm5l Y3Rpb24uZ2V0KCkpICBpbiBTaW1wbGVNZXNzZW5nZXI6OnJlYWR5IGlzIGJldHRlci4NCg0KDQpK aWFucGVuZyBNYQ0KDQo+IEkgZG9uJ3QgcmVhbGx5IHNlZSBob3cgc3VwcGx5aW5nIHRoZSBsb2Nh bCBjb25uZWN0aW9uIGFzIGEgbmV3IG9uZSBpbg0KPiBfc2VuZF9ib290ICpzaG91bGQqIGJlIGZp eGluZyB0aGF0LCBhbmQgaXQncyBub3QgdGhlIHBsYWNlIHRvIGRvIHNvIChhbHRob3VnaCBJDQo+ IGd1ZXNzIGl0J3MgZG9pbmcgKnNvbWV0aGluZyosIEkganVzdCBjYW4ndCBmaWd1cmUgb3V0IHdo YXQpLg0KDQoNCj4gLUdyZWcNCj4gU29mdHdhcmUgRW5naW5lZXIgIzQyIEAgaHR0cDovL2lua3Rh bmsuY29tIHwgaHR0cDovL2NlcGguY29tDQo+IA0KPiANCj4gT24gV2VkLCBKdWwgMTYsIDIwMTQg YXQgNToxNyBQTSwgTWEsIEppYW5wZW5nIDxqaWFucGVuZy5tYUBpbnRlbC5jb20+DQo+IHdyb3Rl Og0KPiA+IEhpIEdyZWcsDQo+ID4gICAgVGhlIGF0dGFjaG1lbnQgaXMgdGhlIGxvZy4NCj4gPg0K PiA+IFRoYW5rcyENCj4gPg0KPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gRnJv bTogR3JlZ29yeSBGYXJudW0gW21haWx0bzpncmVnQGlua3RhbmsuY29tXQ0KPiA+IFNlbnQ6IFRo dXJzZGF5LCBKdWx5IDE3LCAyMDE0IDM6NDEgQU0NCj4gPiBUbzogTWEsIEppYW5wZW5nDQo+ID4g Q2M6IGNlcGgtZGV2ZWxAdmdlci5rZXJuZWwub3JnDQo+ID4gU3ViamVjdDogUmU6IFtSRkNdW1BB VENIXSBvc2Q6IEFkZCBsb2NhbF9jb25uZWN0aW9uIHRvIGZhc3RfZGlzcGF0Y2ggaW4gZnVuYw0K PiBfc2VuZF9ib290Lg0KPiA+DQo+ID4gSSdtIGxvb2tpbmcgYXQgdGhpcyBhbmQgZ2V0dGluZyBh IGxpdHRsZSBjb25mdXNlZC4gQ2FuIHlvdSBwcm92aWRlIGENCj4gPiBsb2cgb2YgdGhlIGNyYXNo IG9jY3VycmluZz8gKHByZWZlcmFibHkgd2l0aCBkZWJ1Z19tcz0yMCwNCj4gPiBkZWJ1Z19vc2Q9 MjApDQo+ID4gLUdyZWcNCj4gPiBTb2Z0d2FyZSBFbmdpbmVlciAjNDIgQCBodHRwOi8vaW5rdGFu ay5jb20gfCBodHRwOi8vY2VwaC5jb20NCj4gPg0KPiA+DQo+ID4gT24gU3VuLCBKdWwgMTMsIDIw MTQgYXQgODoxNyBQTSwgTWEsIEppYW5wZW5nIDxqaWFucGVuZy5tYUBpbnRlbC5jb20+DQo+IHdy b3RlOg0KPiA+PiBXaGVuIGRvIGVjLXJlYWQsIGkgbWV0IGEgYnVnIHdoaWNoIHdhcyBvY2N1cmVk IDEwMCUuIFRoZSBtZXNzYWdlcyBhcmU6DQo+ID4+IDIwMTQtMDctMTQgMTA6MDM6MDcuMzE4Njgx IDdmNzY1NGY2ZTcwMCAtMSBvc2QvT1NELmNjOiBJbiBmdW5jdGlvbg0KPiA+PiAndmlydHVhbCB2 b2lkIE9TRDo6bXNfZmFzdF9kaXNwYXRjaChNZXNzYWdlKiknIHRocmVhZCA3Zjc2NTRmNmU3MDAN Cj4gPj4gdGltZQ0KPiA+PiAyMDE0LTA3LTE0IDEwOjAzOjA3LjMxNjc4MiBvc2QvT1NELmNjOiA1 MDE5OiBGQUlMRUQgYXNzZXJ0KHNlc3Npb24pDQo+ID4+DQo+ID4+ICBjZXBoIHZlcnNpb24gMC44 Mi01ODUtZzc5ZjNmNjcNCj4gPj4gKDc5ZjNmNjc0OTEyMmNlMjk0NGJhYTcwNTQxOTQ5ZDdjYTc1 NTI1ZTYpDQo+ID4+ICAxOiAoT1NEOjptc19mYXN0X2Rpc3BhdGNoKE1lc3NhZ2UqKSsweDI4Nikg WzB4NjU0NGI2XQ0KPiA+PiAgMjogKERpc3BhdGNoUXVldWU6OmZhc3RfZGlzcGF0Y2goTWVzc2Fn ZSopKzB4NTYpIFsweGIwNTlkNl0NCj4gPj4gIDM6IChEaXNwYXRjaFF1ZXVlOjpydW5fbG9jYWxf ZGVsaXZlcnkoKSsweDZiKSBbMHhiMDhlMGJdDQo+ID4+ICA0OiAoRGlzcGF0Y2hRdWV1ZTo6TG9j YWxEZWxpdmVyeVRocmVhZDo6ZW50cnkoKSsweGQpIFsweGE0YTVmZF0NCj4gPj4gIDU6ICgoKSsw eDgxODIpIFsweDdmNzY2NTY3MDE4Ml0NCj4gPj4gIDY6IChjbG9uZSgpKzB4NmQpIFsweDdmNzY2 M2ExMTMwZF0NCj4gPj4gIE5PVEU6IGEgY29weSBvZiB0aGUgZXhlY3V0YWJsZSwgb3IgYG9iamR1 bXAgLXJkUyA8ZXhlY3V0YWJsZT5gIGlzIG5lZWRlZA0KPiB0byBpbnRlcnByZXQgdGhpcy4NCj4g Pj4NCj4gPj4gSW4gY29tbWl0IDY5ZmM2YjJiNjYsIGl0IGVuYWJsZSBmYXN0X2Rpc3BhdGNoIG9u IGxvY2FsIGNvbm5lY3Rpb25zDQo+ID4+IGFuZCBpdCB3aWxsIGFkZCBsb2NhbF9jb25uZWN0aW9u IHRvIGZhc3RfZGlzcGF0Y2ggaW4gZnVuYw0KPiBpbml0X2xvY2FsX2Nvbm5lY3Rpb24uDQo+ID4+ IEJ1dCBpZiB0aGVyZSBpcyBubyBmYXN0LWRpc3BhdGNoLCB0aGUgbG9jYWwgY29ubmVjdGlvbiBj YW4ndCBhZGQuDQo+ID4+DQo+ID4+IElmIHRoZXJlIGlzIG5vIGNsdXRzZXIgYWRkciBpbiBjZXBo LmNvbmYsIGl0IHdpbGwgYWRkDQo+ID4+IGxvY2FsX2Nvbm5lY3Rpb24gdG8gZmFzdCBkaXNwYXRj aCBpbiBmdW5jIF9zZW5kX2Jvb3QgYmVjYXVzZSB0aGUNCj4gY2x1c3Rlcl9hZGRyIGlzIGVtcHR5 Lg0KPiA+PiBCdXQgaWYgdGhlcmUgaXMgY2x1c3RlciBhZGRyLCBsb2NhbF9jb25uZWN0aW9uIGNh bid0IGFkZCB0byBmYXN0IGRpc3BhdGNoLg0KPiA+Pg0KPiA+PiBGb3IgRUNTdWJSZWFkLCBpdCBz ZW5kIHRvIGl0c2VsZiBieSBmdW5jIHNlbmRfbWVzc2FnZV9vc2RfY2x1c3RlciBzbw0KPiA+PiBp dCB3aWxsIGNhdXNlIHRoaXMgYnVnLg0KPiA+Pg0KPiA+PiBJIGRvbid0IGtub3cgYWJvdXQgaGJf YmFjay9mcm9udF9zZXJ2ZXJfbWVzc2VuZ2VyLiBCdXQgdGhleSBhcmUgaW4NCj4gPj4gX3NlbmRf Ym9vdCBsaWtlIGNsdXN0ZXJfbWVzc2VuZ2VyLCBzbyBpIGFsc28gbW9kaWZpZWQgdGhvc2UuDQo+ ID4+DQo+ID4+IFNpZ25lZC1vZmYtYnk6IE1hIEppYW5wZW5nIDxqaWFucGVuZy5tYUBpbnRlbC5j b20+DQo+ID4+IC0tLQ0KPiA+PiAgc3JjL29zZC9PU0QuY2MgfCAxNCArKysrKysrKysrKy0tLQ0K PiA+PiAgMSBmaWxlIGNoYW5nZWQsIDExIGluc2VydGlvbnMoKyksIDMgZGVsZXRpb25zKC0pDQo+ ID4+DQo+ID4+IGRpZmYgLS1naXQgYS9zcmMvb3NkL09TRC5jYyBiL3NyYy9vc2QvT1NELmNjIGlu ZGV4IDUyYTM4MzkuLjc1YjI5NGINCj4gPj4gMTAwNjQ0DQo+ID4+IC0tLSBhL3NyYy9vc2QvT1NE LmNjDQo+ID4+ICsrKyBiL3NyYy9vc2QvT1NELmNjDQo+ID4+IEBAIC0zODUyLDI5ICszODUyLDM3 IEBAIHZvaWQgT1NEOjpfc2VuZF9ib290KCkgIHsNCj4gPj4gICAgZG91dCgxMCkgPDwgIl9zZW5k X2Jvb3QiIDw8IGRlbmRsOw0KPiA+PiAgICBlbnRpdHlfYWRkcl90IGNsdXN0ZXJfYWRkciA9IGNs dXN0ZXJfbWVzc2VuZ2VyLT5nZXRfbXlhZGRyKCk7DQo+ID4+ICsgIENvbm5lY3Rpb24gKmxvY2Fs X2Nvbm5lY3Rpb24gPQ0KPiA+PiArIGNsdXN0ZXJfbWVzc2VuZ2VyLT5nZXRfbG9vcGJhY2tfY29u bmVjdGlvbigpLmdldCgpOw0KPiA+PiAgICBpZiAoY2x1c3Rlcl9hZGRyLmlzX2JsYW5rX2lwKCkp IHsNCj4gPj4gICAgICBpbnQgcG9ydCA9IGNsdXN0ZXJfYWRkci5nZXRfcG9ydCgpOw0KPiA+PiAg ICAgIGNsdXN0ZXJfYWRkciA9IGNsaWVudF9tZXNzZW5nZXItPmdldF9teWFkZHIoKTsNCj4gPj4g ICAgICBjbHVzdGVyX2FkZHIuc2V0X3BvcnQocG9ydCk7DQo+ID4+ICAgICAgY2x1c3Rlcl9tZXNz ZW5nZXItPnNldF9hZGRyX3Vua25vd25zKGNsdXN0ZXJfYWRkcik7DQo+ID4+ICAgICAgZG91dCgx MCkgPDwgIiBhc3N1bWluZyBjbHVzdGVyX2FkZHIgaXAgbWF0Y2hlcyBjbGllbnRfYWRkciIgPDwN Cj4gPj4gZGVuZGw7DQo+ID4+IC0gIH0NCj4gPj4gKyAgfSBlbHNlIGlmIChsb2NhbF9jb25uZWN0 aW9uLT5nZXRfcHJpdigpID09IE5VTEwpDQo+ID4+ICsNCj4gPj4gKyBjbHVzdGVyX21lc3Nlbmdl ci0+bXNfZGVsaXZlcl9oYW5kbGVfZmFzdF9jb25uZWN0KGxvY2FsX2Nvbm5lY3Rpb24pDQo+ID4+ ICsgOw0KPiA+PiArDQo+ID4+ICAgIGVudGl0eV9hZGRyX3QgaGJfYmFja19hZGRyID0NCj4gPj4g aGJfYmFja19zZXJ2ZXJfbWVzc2VuZ2VyLT5nZXRfbXlhZGRyKCk7DQo+ID4+ICsgIGxvY2FsX2Nv bm5lY3Rpb24gPQ0KPiA+PiArIGhiX2JhY2tfc2VydmVyX21lc3Nlbmdlci0+Z2V0X2xvb3BiYWNr X2Nvbm5lY3Rpb24oKS5nZXQoKTsNCj4gPj4gICAgaWYgKGhiX2JhY2tfYWRkci5pc19ibGFua19p cCgpKSB7DQo+ID4+ICAgICAgaW50IHBvcnQgPSBoYl9iYWNrX2FkZHIuZ2V0X3BvcnQoKTsNCj4g Pj4gICAgICBoYl9iYWNrX2FkZHIgPSBjbHVzdGVyX2FkZHI7DQo+ID4+ICAgICAgaGJfYmFja19h ZGRyLnNldF9wb3J0KHBvcnQpOw0KPiA+PiAgICAgIGhiX2JhY2tfc2VydmVyX21lc3Nlbmdlci0+ c2V0X2FkZHJfdW5rbm93bnMoaGJfYmFja19hZGRyKTsNCj4gPj4gICAgICBkb3V0KDEwKSA8PCAi IGFzc3VtaW5nIGhiX2JhY2tfYWRkciBpcCBtYXRjaGVzIGNsdXN0ZXJfYWRkciIgPDwNCj4gPj4g ZGVuZGw7DQo+ID4+IC0gIH0NCj4gPj4gKyAgfSBlbHNlIGlmIChsb2NhbF9jb25uZWN0aW9uLT5n ZXRfcHJpdigpID09IE5VTEwpDQo+ID4+ICsNCj4gPj4gKw0KPiBoYl9iYWNrX3NlcnZlcl9tZXNz ZW5nZXItPm1zX2RlbGl2ZXJfaGFuZGxlX2Zhc3RfY29ubmVjdChsb2NhbF9jb25uDQo+ID4+ICsg ZQ0KPiA+PiArIGN0aW9uKTsNCj4gPj4gKw0KPiA+PiAgICBlbnRpdHlfYWRkcl90IGhiX2Zyb250 X2FkZHIgPQ0KPiA+PiBoYl9mcm9udF9zZXJ2ZXJfbWVzc2VuZ2VyLT5nZXRfbXlhZGRyKCk7DQo+ ID4+ICsgIGxvY2FsX2Nvbm5lY3Rpb24gPQ0KPiA+PiArIGhiX2Zyb250X3NlcnZlcl9tZXNzZW5n ZXItPmdldF9sb29wYmFja19jb25uZWN0aW9uKCkuZ2V0KCk7DQo+ID4+ICAgIGlmIChoYl9mcm9u dF9hZGRyLmlzX2JsYW5rX2lwKCkpIHsNCj4gPj4gICAgICBpbnQgcG9ydCA9IGhiX2Zyb250X2Fk ZHIuZ2V0X3BvcnQoKTsNCj4gPj4gICAgICBoYl9mcm9udF9hZGRyID0gY2xpZW50X21lc3Nlbmdl ci0+Z2V0X215YWRkcigpOw0KPiA+PiAgICAgIGhiX2Zyb250X2FkZHIuc2V0X3BvcnQocG9ydCk7 DQo+ID4+ICAgICAgaGJfZnJvbnRfc2VydmVyX21lc3Nlbmdlci0+c2V0X2FkZHJfdW5rbm93bnMo aGJfZnJvbnRfYWRkcik7DQo+ID4+ICAgICAgZG91dCgxMCkgPDwgIiBhc3N1bWluZyBoYl9mcm9u dF9hZGRyIGlwIG1hdGNoZXMgY2xpZW50X2FkZHIiIDw8DQo+ID4+IGRlbmRsOw0KPiA+PiAtICB9 DQo+ID4+ICsgIH0gZWxzZSBpZiAobG9jYWxfY29ubmVjdGlvbi0+Z2V0X3ByaXYoKSA9PSBOVUxM KQ0KPiA+PiArDQo+ID4+ICsgaGJfZnJvbnRfc2VydmVyX21lc3Nlbmdlci0+bXNfZGVsaXZlcl9o YW5kbGVfZmFzdF9jb25uZWN0KGxvY2FsX2Nvbg0KPiA+PiArIG4NCj4gPj4gKyBlY3Rpb24pOw0K PiA+Pg0KPiA+PiAgICBNT1NEQm9vdCAqbWJvb3QgPSBuZXcgTU9TREJvb3Qoc3VwZXJibG9jaywN Cj4gc2VydmljZS5nZXRfYm9vdF9lcG9jaCgpLA0KPiA+PiAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgaGJfYmFja19hZGRyLCBoYl9mcm9udF9hZGRyLA0KPiA+PiBjbHVzdGVyX2Fk ZHIpOw0KPiA+PiAtLQ0KPiA+PiAxLjkuMQ0KPiA+Pg0K -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jul 20, 2014 at 11:33 PM, Ma, Jianpeng <jianpeng.ma@intel.com> wrote: >> Hrm, I'd really like to see the startup sequence. I see the crash occurring, but I >> don't understand how it's happening — we test this pretty extensively so there >> must be something about your testing configuration that is different than ours. >> Can you provide that part of the log, and maybe a little more description of >> what you think the problem is? > > If the ceph.conf contain "cluster addr", the bug must occur. > For no "cluster addr" in ceph.conf, the local-connection add to fast-dispatch in func _send_boot/ cluster_messenger->set_addr_unknowns. > >> >> In particular, we *always* call init_local_connection when the messenger >> starts, so every messenger who is allowed to receive EC messages should have >> the local connection set up before they get one. > Yes you call init_local_connection. But only adding osd to messenger, the local_conenction can add to dispatch. > In func OSD::init > >>cluster_messenger->add_dispatcher_head(this); > Only after this, the local_connection can add to dispatch. > Because if local_connection has correct type, it can add to dispatch and don’t' care the cluster addr. > When allocate a Messenger, it set the type and only after add_dispatcher_head/tail, the local-connection can add to dispatch. > Maybe add ms_deliver_handle_fast_connect(local_connection.get()) in SimpleMessenger::ready is better. Ooooookay, I see the problem now. I pulled the patch (with some wording changes) into master at commit 9061988ec7eaa922e2b303d9eece86e7c8ee0fa1. I've also created a ticket to clean up the local dispatch Connection setup at http://tracker.ceph.com/issues/8892. Thanks! -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 52a3839..75b294b 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -3852,29 +3852,37 @@ void OSD::_send_boot() { dout(10) << "_send_boot" << dendl; entity_addr_t cluster_addr = cluster_messenger->get_myaddr(); + Connection *local_connection = cluster_messenger->get_loopback_connection().get(); if (cluster_addr.is_blank_ip()) { int port = cluster_addr.get_port(); cluster_addr = client_messenger->get_myaddr(); cluster_addr.set_port(port); cluster_messenger->set_addr_unknowns(cluster_addr); dout(10) << " assuming cluster_addr ip matches client_addr" << dendl; - } + } else if (local_connection->get_priv() == NULL) + cluster_messenger->ms_deliver_handle_fast_connect(local_connection); + entity_addr_t hb_back_addr = hb_back_server_messenger->get_myaddr(); + local_connection = hb_back_server_messenger->get_loopback_connection().get(); if (hb_back_addr.is_blank_ip()) { int port = hb_back_addr.get_port(); hb_back_addr = cluster_addr; hb_back_addr.set_port(port); hb_back_server_messenger->set_addr_unknowns(hb_back_addr); dout(10) << " assuming hb_back_addr ip matches cluster_addr" << dendl; - } + } else if (local_connection->get_priv() == NULL) + hb_back_server_messenger->ms_deliver_handle_fast_connect(local_connection); + entity_addr_t hb_front_addr = hb_front_server_messenger->get_myaddr(); + local_connection = hb_front_server_messenger->get_loopback_connection().get(); if (hb_front_addr.is_blank_ip()) { int port = hb_front_addr.get_port(); hb_front_addr = client_messenger->get_myaddr(); hb_front_addr.set_port(port); hb_front_server_messenger->set_addr_unknowns(hb_front_addr); dout(10) << " assuming hb_front_addr ip matches client_addr" << dendl; - } + } else if (local_connection->get_priv() == NULL) + hb_front_server_messenger->ms_deliver_handle_fast_connect(local_connection); MOSDBoot *mboot = new MOSDBoot(superblock, service.get_boot_epoch(), hb_back_addr, hb_front_addr, cluster_addr);
When do ec-read, i met a bug which was occured 100%. The messages are: 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700 time 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session) ceph version 0.82-585-g79f3f67 (79f3f6749122ce2944baa70541949d7ca75525e6) 1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6] 2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6] 3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b] 4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd] 5: (()+0x8182) [0x7f7665670182] 6: (clone()+0x6d) [0x7f7663a1130d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. In commit 69fc6b2b66, it enable fast_dispatch on local connections and it will add local_connection to fast_dispatch in func init_local_connection. But if there is no fast-dispatch, the local connection can't add. If there is no clutser addr in ceph.conf, it will add local_connection to fast dispatch in func _send_boot because the cluster_addr is empty. But if there is cluster addr, local_connection can't add to fast dispatch. For ECSubRead, it send to itself by func send_message_osd_cluster so it will cause this bug. I don't know about hb_back/front_server_messenger. But they are in _send_boot like cluster_messenger, so i also modified those. Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com> --- src/osd/OSD.cc | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html