Message ID | alpine.DEB.2.11.1801171856230.24931@piezo.novalocal (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Sage, thanks again but this results into: 0> 2018-01-17 21:43:09.949980 7f29c9fff700 -1 /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const std::vecto r<pg_log_entry_t>&, ObjectStore::Transaction&)' thread 7f29c9fff700 time 2018-01-17 21:43:09.946989 /build/ceph/src/osd/PG.cc: 3395: FAILED assert(r == 0) ceph version 12.2.2-94-g92923ef (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5609d3a182a2] 2: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, ObjectStore::Transaction&)+0x257) [0x5 609d3517d07] 3: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, ObjectStore::Transa ction&, bool)+0x538) [0x5609d353e018] 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_his tory_t> const&, eversion_t const&, eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x5609d3632e14] 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) [0x5609d373e572] 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) [0x5609d37445b4] 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5609d3669fc0] 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x77b) [0x5609d35d62ab] 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x5609d3462bc7] 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x5609d36daa07] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x5609d3491d1c] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x5609d3a1df2d] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5609d3a1fef0] 14: (()+0x8064) [0x7f2a25b6b064] 15: (clone()+0x6d) [0x7f2a24c5f62d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Stefan Am 17.01.2018 um 19:56 schrieb Sage Weil: > On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >> >> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> this gives me another crash while that pg is recovering: >>>> >>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>> PrimaryLogPG::on_l >>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>> me 2018-01-17 19:25:09.322287 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>> recovery_info.ss.clone_snaps.end()) >>> >>> Is this a cache tiering pool? >> >> no normal 3 replica but the pg is degraded: >> ceph pg dump | grep 3.80e >> >> 3.80e 1709 0 1579 0 0 6183674368 >> 10014 10014 active+undersized+degraded+remapped+backfill_wait >> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >> 50 [39] 39 907737'69776430 2018-01-14 >> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > > Hrm, no real clues on teh root cause then. Something like this will work > around the current assert: > > diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > index d42f3a401b..0f76134f74 100644 > --- a/src/osd/PrimaryLogPG.cc > +++ b/src/osd/PrimaryLogPG.cc > @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > set<snapid_t> snaps; > dout(20) << " snapset " << recovery_info.ss << dendl; > auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > - snaps.insert(p->second.begin(), p->second.end()); > + if (p != recovery_info.ss.clone_snaps.end()) { > + derr << __func__ << " no clone_snaps for " << hoid << dendl; > + } else { > + snaps.insert(p->second.begin(), p->second.end()); > + } > dout(20) << " snaps " << snaps << dendl; > snap_mapper.add_oid( > recovery_info.soid, > > >> >> Stefan >> >>> >>> s >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x55addb5eb1f2] >>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>> const&, std::shared_ptr<ObjectContext>, bool, >>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>> [0x55addb30748f] >>>> 5: >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>> [0x55addb317531] >>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>> [0x55addb23cf10] >>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x55addb035bc7] >>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x55addb2ad957] >>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x55addb5f0e7d] >>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>> 13: (()+0x8064) [0x7f4955b68064] >>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi, >>>>>> >>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>> >>>>> I believe you can avoid the immediate problem and get the PG up by >>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>> list mapping. >>>>> >>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>> state that won't trim (although from a quick look at the code it won't >>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>> happens. >>>>> >>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>> >>>>> sage >>>>> >>>>> >>>>> > >>>>> >>>>>> Stefan >>>>>> >>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>> being down. >>>>>>>> >>>>>>>> All of them fail with: >>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>> >>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>> [0x561f9fb76f3b] >>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>> [0x561f9fc314b2] >>>>>>>> 7: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>> [0x561f9fc374f4] >>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x561f9fb5cf10] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561f9f955bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561f9ff10e6d] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>> >>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>> >>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>> >>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>> any experience with this but you'd probably have to export the PG >>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>> out of it. I see options to remove both an object and >>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>> them myself. >>>>>>> -Greg >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, OK as this one is also just an asser in case a snap cannot be removed i will comment that assert as well. Hopefully that's it. Greets, Stefan Am 17.01.2018 um 21:45 schrieb Stefan Priebe - Profihost AG: > Hi Sage, > > thanks again but this results into: > 0> 2018-01-17 21:43:09.949980 7f29c9fff700 -1 > /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > std::vecto > r<pg_log_entry_t>&, ObjectStore::Transaction&)' thread 7f29c9fff700 time > 2018-01-17 21:43:09.946989 > /build/ceph/src/osd/PG.cc: 3395: FAILED assert(r == 0) > > ceph version 12.2.2-94-g92923ef > (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x5609d3a182a2] > 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > std::allocator<pg_log_entry_t> > const&, > ObjectStore::Transaction&)+0x257) [0x5 > 609d3517d07] > 3: (PG::append_log(std::vector<pg_log_entry_t, > std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > ObjectStore::Transa > ction&, bool)+0x538) [0x5609d353e018] > 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_his > tory_t> const&, eversion_t const&, eversion_t const&, bool, > ObjectStore::Transaction&)+0x64) [0x5609d3632e14] > 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > [0x5609d373e572] > 6: > (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > [0x5609d37445b4] > 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > [0x5609d3669fc0] > 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x77b) [0x5609d35d62ab] > 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > [0x5609d3462bc7] > 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > const&)+0x57) [0x5609d36daa07] > 11: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x108c) [0x5609d3491d1c] > 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > [0x5609d3a1df2d] > 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5609d3a1fef0] > 14: (()+0x8064) [0x7f2a25b6b064] > 15: (clone()+0x6d) [0x7f2a24c5f62d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > > Stefan > > Am 17.01.2018 um 19:56 schrieb Sage Weil: >> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>> >>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>> Hi Sage, >>>>> >>>>> this gives me another crash while that pg is recovering: >>>>> >>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>> PrimaryLogPG::on_l >>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>> me 2018-01-17 19:25:09.322287 >>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>> recovery_info.ss.clone_snaps.end()) >>>> >>>> Is this a cache tiering pool? >>> >>> no normal 3 replica but the pg is degraded: >>> ceph pg dump | grep 3.80e >>> >>> 3.80e 1709 0 1579 0 0 6183674368 >>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>> 50 [39] 39 907737'69776430 2018-01-14 >>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >> >> Hrm, no real clues on teh root cause then. Something like this will work >> around the current assert: >> >> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >> index d42f3a401b..0f76134f74 100644 >> --- a/src/osd/PrimaryLogPG.cc >> +++ b/src/osd/PrimaryLogPG.cc >> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >> set<snapid_t> snaps; >> dout(20) << " snapset " << recovery_info.ss << dendl; >> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >> - snaps.insert(p->second.begin(), p->second.end()); >> + if (p != recovery_info.ss.clone_snaps.end()) { >> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >> + } else { >> + snaps.insert(p->second.begin(), p->second.end()); >> + } >> dout(20) << " snaps " << snaps << dendl; >> snap_mapper.add_oid( >> recovery_info.soid, >> >> >>> >>> Stefan >>> >>>> >>>> s >>>>> >>>>> ceph version 12.2.2-94-g92923ef >>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>> const*)+0x102) [0x55addb5eb1f2] >>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>> [0x55addb30748f] >>>>> 5: >>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>> [0x55addb317531] >>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>> [0x55addb23cf10] >>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>> [0x55addb035bc7] >>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>> const&)+0x57) [0x55addb2ad957] >>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>> [0x55addb5f0e7d] >>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>> needed to interpret this. >>>>> >>>>> Greets, >>>>> Stefan >>>>> >>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>> Hi, >>>>>>> >>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>> >>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>> list mapping. >>>>>> >>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>> state that won't trim (although from a quick look at the code it won't >>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>> happens. >>>>>> >>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>> >>>>>> sage >>>>>> >>>>>> >>>>>> > >>>>>> >>>>>>> Stefan >>>>>>> >>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>> being down. >>>>>>>>> >>>>>>>>> All of them fail with: >>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>> >>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>> [0x561f9fb76f3b] >>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>> [0x561f9fc314b2] >>>>>>>>> 7: >>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>> [0x561f9fc374f4] >>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>> [0x561f9fb5cf10] >>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>> [0x561f9f955bc7] >>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>> [0x561f9ff10e6d] >>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>> >>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>> >>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>> out of it. I see options to remove both an object and >>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>> them myself. >>>>>>>> -Greg >>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, if i set nobackfill - the cluster stays up even though some pgs have only one replica. The last one i'm struggling is this one: 0> 2018-01-17 22:58:20.855380 7f8c73bff700 -1 *** Caught signal (Segmentation fault) ** in thread 7f8c73bff700 thread_name:tp_osd_tp ceph version 12.2.2-94-g92923ef (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) 1: (()+0xa43ebc) [0x56180c34cebc] 2: (()+0xf890) [0x7f8cbfc3e890] 3: (std::_Rb_tree_iterator<snapid_t> std::_Rb_tree<snapid_t, snapid_t, std::_Identity<snapid_t>, std::less<snapid_t>, std::allocato r<snapid_t> >::_M_insert_unique_<snapid_t&>(std::_Rb_tree_const_iterator<snapid_t>, snapid_t&)+0x40) [0x56180bee4720] 4: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore:: Transaction*)+0x114e) [0x56180bf3a72e] 5: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x56180c0ac2cd] 6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) [0x56180c0ac56f] 7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) [0x56180c0bc611] 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x56180bfe1ff0] 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x77b) [0x56180bf4e2db] 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x56180bddabc7] 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x56180c052a37] 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x56180be09d1c] 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x56180c395f3d] 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x56180c397f00] 15: (()+0x8064) [0x7f8cbfc37064] 16: (clone()+0x6d) [0x7f8cbed2b62d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Greets, Stefan Am 17.01.2018 um 19:56 schrieb Sage Weil: > On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >> >> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> this gives me another crash while that pg is recovering: >>>> >>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>> PrimaryLogPG::on_l >>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>> me 2018-01-17 19:25:09.322287 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>> recovery_info.ss.clone_snaps.end()) >>> >>> Is this a cache tiering pool? >> >> no normal 3 replica but the pg is degraded: >> ceph pg dump | grep 3.80e >> >> 3.80e 1709 0 1579 0 0 6183674368 >> 10014 10014 active+undersized+degraded+remapped+backfill_wait >> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >> 50 [39] 39 907737'69776430 2018-01-14 >> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > > Hrm, no real clues on teh root cause then. Something like this will work > around the current assert: > > diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > index d42f3a401b..0f76134f74 100644 > --- a/src/osd/PrimaryLogPG.cc > +++ b/src/osd/PrimaryLogPG.cc > @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > set<snapid_t> snaps; > dout(20) << " snapset " << recovery_info.ss << dendl; > auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > - snaps.insert(p->second.begin(), p->second.end()); > + if (p != recovery_info.ss.clone_snaps.end()) { > + derr << __func__ << " no clone_snaps for " << hoid << dendl; > + } else { > + snaps.insert(p->second.begin(), p->second.end()); > + } > dout(20) << " snaps " << snaps << dendl; > snap_mapper.add_oid( > recovery_info.soid, > > >> >> Stefan >> >>> >>> s >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x55addb5eb1f2] >>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>> const&, std::shared_ptr<ObjectContext>, bool, >>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>> [0x55addb30748f] >>>> 5: >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>> [0x55addb317531] >>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>> [0x55addb23cf10] >>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x55addb035bc7] >>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x55addb2ad957] >>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x55addb5f0e7d] >>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>> 13: (()+0x8064) [0x7f4955b68064] >>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi, >>>>>> >>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>> >>>>> I believe you can avoid the immediate problem and get the PG up by >>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>> list mapping. >>>>> >>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>> state that won't trim (although from a quick look at the code it won't >>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>> happens. >>>>> >>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>> >>>>> sage >>>>> >>>>> >>>>> > >>>>> >>>>>> Stefan >>>>>> >>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>> being down. >>>>>>>> >>>>>>>> All of them fail with: >>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>> >>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>> [0x561f9fb76f3b] >>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>> [0x561f9fc314b2] >>>>>>>> 7: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>> [0x561f9fc374f4] >>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x561f9fb5cf10] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561f9f955bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561f9ff10e6d] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>> >>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>> >>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>> >>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>> any experience with this but you'd probably have to export the PG >>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>> out of it. I see options to remove both an object and >>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>> them myself. >>>>>>> -Greg >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, it also crashes in (marked with HERE): int SnapMapper::get_snaps( const hobject_t &oid, object_snaps *out) { assert(check(oid)); set<string> keys; map<string, bufferlist> got; keys.insert(to_object_key(oid)); int r = backend.get_keys(keys, &got); if (r < 0) return r; if (got.empty()) return -ENOENT; if (out) { bufferlist::iterator bp = got.begin()->second.begin(); ::decode(*out, bp); dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; assert(!out->snaps.empty()); ########### HERE ########### } else { dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; } return 0; } is it save to comment that assert? Stefan Am 17.01.2018 um 19:56 schrieb Sage Weil: > On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >> >> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> this gives me another crash while that pg is recovering: >>>> >>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>> PrimaryLogPG::on_l >>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>> me 2018-01-17 19:25:09.322287 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>> recovery_info.ss.clone_snaps.end()) >>> >>> Is this a cache tiering pool? >> >> no normal 3 replica but the pg is degraded: >> ceph pg dump | grep 3.80e >> >> 3.80e 1709 0 1579 0 0 6183674368 >> 10014 10014 active+undersized+degraded+remapped+backfill_wait >> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >> 50 [39] 39 907737'69776430 2018-01-14 >> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > > Hrm, no real clues on teh root cause then. Something like this will work > around the current assert: > > diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > index d42f3a401b..0f76134f74 100644 > --- a/src/osd/PrimaryLogPG.cc > +++ b/src/osd/PrimaryLogPG.cc > @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > set<snapid_t> snaps; > dout(20) << " snapset " << recovery_info.ss << dendl; > auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > - snaps.insert(p->second.begin(), p->second.end()); > + if (p != recovery_info.ss.clone_snaps.end()) { > + derr << __func__ << " no clone_snaps for " << hoid << dendl; > + } else { > + snaps.insert(p->second.begin(), p->second.end()); > + } > dout(20) << " snaps " << snaps << dendl; > snap_mapper.add_oid( > recovery_info.soid, > > >> >> Stefan >> >>> >>> s >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x55addb5eb1f2] >>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>> const&, std::shared_ptr<ObjectContext>, bool, >>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>> [0x55addb30748f] >>>> 5: >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>> [0x55addb317531] >>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>> [0x55addb23cf10] >>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x55addb035bc7] >>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x55addb2ad957] >>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x55addb5f0e7d] >>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>> 13: (()+0x8064) [0x7f4955b68064] >>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi, >>>>>> >>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>> >>>>> I believe you can avoid the immediate problem and get the PG up by >>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>> list mapping. >>>>> >>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>> state that won't trim (although from a quick look at the code it won't >>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>> happens. >>>>> >>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>> >>>>> sage >>>>> >>>>> >>>>> > >>>>> >>>>>> Stefan >>>>>> >>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>> being down. >>>>>>>> >>>>>>>> All of them fail with: >>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>> >>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>> [0x561f9fb76f3b] >>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>> [0x561f9fc314b2] >>>>>>>> 7: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>> [0x561f9fc374f4] >>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x561f9fb5cf10] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561f9f955bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561f9ff10e6d] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>> >>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>> >>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>> >>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>> any experience with this but you'd probably have to export the PG >>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>> out of it. I see options to remove both an object and >>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>> them myself. >>>>>>> -Greg >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, OK the cluster has now fully recovered the only problem is now that i can't enable scrubs as they fail with: 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 /build/ceph/src/osd/SnapMapper.cc: In function 'int SnapMapper::get_snaps(const h object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time 2018-01-18 13:00:54.840396 /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) ceph version 12.2.2-94-g92923ef (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5561ce28d1f2] 2: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 697] 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x5561cdcd7bc7] 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x5561cdf4f957] 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x5561ce292e7d] 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] 13: (()+0x8064) [0x7fbd70d86064] 14: (clone()+0x6d) [0x7fbd6fe7a62d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Greets, Stefan Am 17.01.2018 um 19:56 schrieb Sage Weil: > On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >> >> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> this gives me another crash while that pg is recovering: >>>> >>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>> PrimaryLogPG::on_l >>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>> me 2018-01-17 19:25:09.322287 >>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>> recovery_info.ss.clone_snaps.end()) >>> >>> Is this a cache tiering pool? >> >> no normal 3 replica but the pg is degraded: >> ceph pg dump | grep 3.80e >> >> 3.80e 1709 0 1579 0 0 6183674368 >> 10014 10014 active+undersized+degraded+remapped+backfill_wait >> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >> 50 [39] 39 907737'69776430 2018-01-14 >> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > > Hrm, no real clues on teh root cause then. Something like this will work > around the current assert: > > diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > index d42f3a401b..0f76134f74 100644 > --- a/src/osd/PrimaryLogPG.cc > +++ b/src/osd/PrimaryLogPG.cc > @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > set<snapid_t> snaps; > dout(20) << " snapset " << recovery_info.ss << dendl; > auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > - snaps.insert(p->second.begin(), p->second.end()); > + if (p != recovery_info.ss.clone_snaps.end()) { > + derr << __func__ << " no clone_snaps for " << hoid << dendl; > + } else { > + snaps.insert(p->second.begin(), p->second.end()); > + } > dout(20) << " snaps " << snaps << dendl; > snap_mapper.add_oid( > recovery_info.soid, > > >> >> Stefan >> >>> >>> s >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x55addb5eb1f2] >>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>> const&, std::shared_ptr<ObjectContext>, bool, >>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>> [0x55addb30748f] >>>> 5: >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>> [0x55addb317531] >>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>> [0x55addb23cf10] >>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x55addb035bc7] >>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x55addb2ad957] >>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x55addb5f0e7d] >>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>> 13: (()+0x8064) [0x7f4955b68064] >>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi, >>>>>> >>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>> >>>>> I believe you can avoid the immediate problem and get the PG up by >>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>> list mapping. >>>>> >>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>> state that won't trim (although from a quick look at the code it won't >>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>> happens. >>>>> >>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>> >>>>> sage >>>>> >>>>> >>>>> > >>>>> >>>>>> Stefan >>>>>> >>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>> being down. >>>>>>>> >>>>>>>> All of them fail with: >>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>> >>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>> [0x561f9fb76f3b] >>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>> [0x561f9fc314b2] >>>>>>>> 7: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>> [0x561f9fc374f4] >>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x561f9fb5cf10] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561f9f955bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561f9ff10e6d] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>> >>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>> >>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>> >>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>> any experience with this but you'd probably have to export the PG >>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>> out of it. I see options to remove both an object and >>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>> them myself. >>>>>>> -Greg >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > Hi, > > it also crashes in (marked with HERE): > > int SnapMapper::get_snaps( > const hobject_t &oid, > object_snaps *out) > { > assert(check(oid)); > set<string> keys; > map<string, bufferlist> got; > keys.insert(to_object_key(oid)); > int r = backend.get_keys(keys, &got); > if (r < 0) > return r; > if (got.empty()) > return -ENOENT; > if (out) { > bufferlist::iterator bp = got.begin()->second.begin(); > ::decode(*out, bp); > dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > assert(!out->snaps.empty()); ########### HERE ########### > } else { > dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > } > return 0; > } > > is it save to comment that assert? I think so. All of these asserts are just about ensuring the snapmapper state is consistent, and it's only purpose is to find snaps to trim. Since your snapmapper metadata is clearly not consistent, there isn't a whole lot of risk here. You might want to set nosnaptrim for the time being so you don't get a surprise if trimming kicks in. The next step is probably going to be to do a scrub and see what it finds, repairs, or can't repair. sage > Stefan > Am 17.01.2018 um 19:56 schrieb Sage Weil: > > On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> > >> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> this gives me another crash while that pg is recovering: > >>>> > >>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>> PrimaryLogPG::on_l > >>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>> me 2018-01-17 19:25:09.322287 > >>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>> recovery_info.ss.clone_snaps.end()) > >>> > >>> Is this a cache tiering pool? > >> > >> no normal 3 replica but the pg is degraded: > >> ceph pg dump | grep 3.80e > >> > >> 3.80e 1709 0 1579 0 0 6183674368 > >> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >> 50 [39] 39 907737'69776430 2018-01-14 > >> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > > > > Hrm, no real clues on teh root cause then. Something like this will work > > around the current assert: > > > > diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > > index d42f3a401b..0f76134f74 100644 > > --- a/src/osd/PrimaryLogPG.cc > > +++ b/src/osd/PrimaryLogPG.cc > > @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > > set<snapid_t> snaps; > > dout(20) << " snapset " << recovery_info.ss << dendl; > > auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > > - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > > - snaps.insert(p->second.begin(), p->second.end()); > > + if (p != recovery_info.ss.clone_snaps.end()) { > > + derr << __func__ << " no clone_snaps for " << hoid << dendl; > > + } else { > > + snaps.insert(p->second.begin(), p->second.end()); > > + } > > dout(20) << " snaps " << snaps << dendl; > > snap_mapper.add_oid( > > recovery_info.soid, > > > > > >> > >> Stefan > >> > >>> > >>> s > >>>> > >>>> ceph version 12.2.2-94-g92923ef > >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>> const*)+0x102) [0x55addb5eb1f2] > >>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>> [0x55addb30748f] > >>>> 5: > >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>> [0x55addb317531] > >>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>> [0x55addb23cf10] > >>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>> [0x55addb035bc7] > >>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>> const&)+0x57) [0x55addb2ad957] > >>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>> [0x55addb5f0e7d] > >>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>> 13: (()+0x8064) [0x7f4955b68064] > >>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>> needed to interpret this. > >>>> > >>>> Greets, > >>>> Stefan > >>>> > >>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi, > >>>>>> > >>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>> > >>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>> list mapping. > >>>>> > >>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>> state that won't trim (although from a quick look at the code it won't > >>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>> happens. > >>>>> > >>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>> > >>>>> sage > >>>>> > >>>>> > >>>>> > > >>>>> > >>>>>> Stefan > >>>>>> > >>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>> being down. > >>>>>>>> > >>>>>>>> All of them fail with: > >>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>> > >>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>> [0x561f9fb76f3b] > >>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>> [0x561f9fc314b2] > >>>>>>>> 7: > >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>> [0x561f9fc374f4] > >>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>> [0x561f9fb5cf10] > >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>> [0x561f9f955bc7] > >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>> [0x561f9ff10e6d] > >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>> needed to interpret this. > >>>>>>> > >>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>> > >>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>> > >>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>> out of it. I see options to remove both an object and > >>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>> them myself. > >>>>>>> -Greg > >>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>> > >>>> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, Am 18.01.2018 um 14:16 schrieb Sage Weil: > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi, >> >> it also crashes in (marked with HERE): >> >> int SnapMapper::get_snaps( >> const hobject_t &oid, >> object_snaps *out) >> { >> assert(check(oid)); >> set<string> keys; >> map<string, bufferlist> got; >> keys.insert(to_object_key(oid)); >> int r = backend.get_keys(keys, &got); >> if (r < 0) >> return r; >> if (got.empty()) >> return -ENOENT; >> if (out) { >> bufferlist::iterator bp = got.begin()->second.begin(); >> ::decode(*out, bp); >> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >> assert(!out->snaps.empty()); ########### HERE ########### >> } else { >> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >> } >> return 0; >> } >> >> is it save to comment that assert? > > I think so. All of these asserts are just about ensuring the snapmapper > state is consistent, and it's only purpose is to find snaps to trim. > Since your snapmapper metadata is clearly not consistent, there isn't a > whole lot of risk here. You might want to set nosnaptrim for the time > being so you don't get a surprise if trimming kicks in. The next step is > probably going to be to do a scrub and see what it finds, repairs, or > can't repair. snap trimming works fine i already trimmed and removed some snaps. I was able to get the cluster into a state where all is backfilled. But enabling / doing deep-scrubs results into this one: 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 /build/ceph/src/osd/SnapMapper.cc: In function 'int SnapMapper::get_snaps(const h object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time 2018-01-18 13:00:54.840396 /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) ceph version 12.2.2-94-g92923ef (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5561ce28d1f2] 2: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 697] 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x5561cdcd7bc7] 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x5561cdf4f957] 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x5561ce292e7d] 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] 13: (()+0x8064) [0x7fbd70d86064] 14: (clone()+0x6d) [0x7fbd6fe7a62d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Greets, Stefan > sage > > >> Stefan >> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> this gives me another crash while that pg is recovering: >>>>>> >>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>> PrimaryLogPG::on_l >>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>> me 2018-01-17 19:25:09.322287 >>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>> recovery_info.ss.clone_snaps.end()) >>>>> >>>>> Is this a cache tiering pool? >>>> >>>> no normal 3 replica but the pg is degraded: >>>> ceph pg dump | grep 3.80e >>>> >>>> 3.80e 1709 0 1579 0 0 6183674368 >>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>> 50 [39] 39 907737'69776430 2018-01-14 >>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>> >>> Hrm, no real clues on teh root cause then. Something like this will work >>> around the current assert: >>> >>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>> index d42f3a401b..0f76134f74 100644 >>> --- a/src/osd/PrimaryLogPG.cc >>> +++ b/src/osd/PrimaryLogPG.cc >>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>> set<snapid_t> snaps; >>> dout(20) << " snapset " << recovery_info.ss << dendl; >>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>> - snaps.insert(p->second.begin(), p->second.end()); >>> + if (p != recovery_info.ss.clone_snaps.end()) { >>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>> + } else { >>> + snaps.insert(p->second.begin(), p->second.end()); >>> + } >>> dout(20) << " snaps " << snaps << dendl; >>> snap_mapper.add_oid( >>> recovery_info.soid, >>> >>> >>>> >>>> Stefan >>>> >>>>> >>>>> s >>>>>> >>>>>> ceph version 12.2.2-94-g92923ef >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>> [0x55addb30748f] >>>>>> 5: >>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>> [0x55addb317531] >>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>> [0x55addb23cf10] >>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>> [0x55addb035bc7] >>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>> const&)+0x57) [0x55addb2ad957] >>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>> [0x55addb5f0e7d] >>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>> needed to interpret this. >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>> >>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>> list mapping. >>>>>>> >>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>> happens. >>>>>>> >>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>> >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> > >>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>> being down. >>>>>>>>>> >>>>>>>>>> All of them fail with: >>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>> 7: >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>> >>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>> >>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>> >>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>> them myself. >>>>>>>>> -Greg >>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > Am 18.01.2018 um 14:16 schrieb Sage Weil: > > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> Hi, > >> > >> it also crashes in (marked with HERE): > >> > >> int SnapMapper::get_snaps( > >> const hobject_t &oid, > >> object_snaps *out) > >> { > >> assert(check(oid)); > >> set<string> keys; > >> map<string, bufferlist> got; > >> keys.insert(to_object_key(oid)); > >> int r = backend.get_keys(keys, &got); > >> if (r < 0) > >> return r; > >> if (got.empty()) > >> return -ENOENT; > >> if (out) { > >> bufferlist::iterator bp = got.begin()->second.begin(); > >> ::decode(*out, bp); > >> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >> assert(!out->snaps.empty()); ########### HERE ########### > >> } else { > >> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >> } > >> return 0; > >> } > >> > >> is it save to comment that assert? > > > > I think so. All of these asserts are just about ensuring the snapmapper > > state is consistent, and it's only purpose is to find snaps to trim. > > Since your snapmapper metadata is clearly not consistent, there isn't a > > whole lot of risk here. You might want to set nosnaptrim for the time > > being so you don't get a surprise if trimming kicks in. The next step is > > probably going to be to do a scrub and see what it finds, repairs, or > > can't repair. > > snap trimming works fine i already trimmed and removed some snaps. > > I was able to get the cluster into a state where all is backfilled. But > enabling / doing deep-scrubs results into this one: > 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > /build/ceph/src/osd/SnapMapper.cc: In function 'int > SnapMapper::get_snaps(const h > object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > 2018-01-18 13:00:54.840396 > /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) I think if you switch that assert to a warning it will repair... sage > > ceph version 12.2.2-94-g92923ef > (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x5561ce28d1f2] > 2: (SnapMapper::get_snaps(hobject_t const&, > SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > 697] > 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > [0x5561cdcd7bc7] > 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > const&)+0x57) [0x5561cdf4f957] > 10: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > [0x5561ce292e7d] > 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > 13: (()+0x8064) [0x7fbd70d86064] > 14: (clone()+0x6d) [0x7fbd6fe7a62d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > > Greets, > Stefan > > > sage > > > > > >> Stefan > >> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> > >>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi Sage, > >>>>>> > >>>>>> this gives me another crash while that pg is recovering: > >>>>>> > >>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>> PrimaryLogPG::on_l > >>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>> me 2018-01-17 19:25:09.322287 > >>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>> recovery_info.ss.clone_snaps.end()) > >>>>> > >>>>> Is this a cache tiering pool? > >>>> > >>>> no normal 3 replica but the pg is degraded: > >>>> ceph pg dump | grep 3.80e > >>>> > >>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>> > >>> Hrm, no real clues on teh root cause then. Something like this will work > >>> around the current assert: > >>> > >>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>> index d42f3a401b..0f76134f74 100644 > >>> --- a/src/osd/PrimaryLogPG.cc > >>> +++ b/src/osd/PrimaryLogPG.cc > >>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>> set<snapid_t> snaps; > >>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>> - snaps.insert(p->second.begin(), p->second.end()); > >>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>> + } else { > >>> + snaps.insert(p->second.begin(), p->second.end()); > >>> + } > >>> dout(20) << " snaps " << snaps << dendl; > >>> snap_mapper.add_oid( > >>> recovery_info.soid, > >>> > >>> > >>>> > >>>> Stefan > >>>> > >>>>> > >>>>> s > >>>>>> > >>>>>> ceph version 12.2.2-94-g92923ef > >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>> [0x55addb30748f] > >>>>>> 5: > >>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>> [0x55addb317531] > >>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>> [0x55addb23cf10] > >>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>> [0x55addb035bc7] > >>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>> [0x55addb5f0e7d] > >>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>> needed to interpret this. > >>>>>> > >>>>>> Greets, > >>>>>> Stefan > >>>>>> > >>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>> > >>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>> list mapping. > >>>>>>> > >>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>> happens. > >>>>>>> > >>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>> > >>>>>>> sage > >>>>>>> > >>>>>>> > >>>>>>> > > >>>>>>> > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>> Hello, > >>>>>>>>>> > >>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>> being down. > >>>>>>>>>> > >>>>>>>>>> All of them fail with: > >>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>> > >>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>> 7: > >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>> needed to interpret this. > >>>>>>>>> > >>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>> > >>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>> > >>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>> them myself. > >>>>>>>>> -Greg > >>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>> > >>>> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1/18/2018 5:24 PM, Sage Weil wrote: > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi, >>>> >>>> it also crashes in (marked with HERE): >>>> >>>> int SnapMapper::get_snaps( >>>> const hobject_t &oid, >>>> object_snaps *out) >>>> { >>>> assert(check(oid)); >>>> set<string> keys; >>>> map<string, bufferlist> got; >>>> keys.insert(to_object_key(oid)); >>>> int r = backend.get_keys(keys, &got); >>>> if (r < 0) >>>> return r; >>>> if (got.empty()) >>>> return -ENOENT; >>>> if (out) { >>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>> ::decode(*out, bp); >>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>> assert(!out->snaps.empty()); ########### HERE ########### >>>> } else { >>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>> } >>>> return 0; >>>> } >>>> >>>> is it save to comment that assert? >>> I think so. All of these asserts are just about ensuring the snapmapper >>> state is consistent, and it's only purpose is to find snaps to trim. >>> Since your snapmapper metadata is clearly not consistent, there isn't a >>> whole lot of risk here. You might want to set nosnaptrim for the time >>> being so you don't get a surprise if trimming kicks in. The next step is >>> probably going to be to do a scrub and see what it finds, repairs, or >>> can't repair. >> snap trimming works fine i already trimmed and removed some snaps. >> >> I was able to get the cluster into a state where all is backfilled. But >> enabling / doing deep-scrubs results into this one: >> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >> /build/ceph/src/osd/SnapMapper.cc: In function 'int >> SnapMapper::get_snaps(const h >> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >> 2018-01-18 13:00:54.840396 >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > I think if you switch that We observed similar issue with that assert as well. Removing the assert partially resolves the issue. Another segmentation fault was observed in PrimaryLogPG::find_object_assert() when accessing empty snapshot list on object removal (specificall p->second.back() in the code below). ... auto p = obc->ssc->snapset.clone_snaps.find(soid.snap); assert(p != obc->ssc->snapset.clone_snaps.end()); first = p->second.back(); ^^^^^^ This indicates (and confirmed by our manual omap/attrs verification) that both SnapMapper's and per-object snapshot lists are empty. Current workaround for that was to verify p->second.empty() and return an error for the function. This still prevents from object removal (cache tier flush in our case) but fixed frequent OSD crashes for now. > assert to a warning it will repair... Removing an assert partially help > > sage > >> ceph version 12.2.2-94-g92923ef >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x5561ce28d1f2] >> 2: (SnapMapper::get_snaps(hobject_t const&, >> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >> 697] >> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >> [0x5561cdcd7bc7] >> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x5561cdf4f957] >> 10: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >> [0x5561ce292e7d] >> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >> 13: (()+0x8064) [0x7fbd70d86064] >> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> >> Greets, >> Stefan >> >>> sage >>> >>> >>>> Stefan >>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>> >>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>> PrimaryLogPG::on_l >>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>> Is this a cache tiering pool? >>>>>> no normal 3 replica but the pg is degraded: >>>>>> ceph pg dump | grep 3.80e >>>>>> >>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>> around the current assert: >>>>> >>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>> index d42f3a401b..0f76134f74 100644 >>>>> --- a/src/osd/PrimaryLogPG.cc >>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>> set<snapid_t> snaps; >>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>> + } else { >>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>> + } >>>>> dout(20) << " snaps " << snaps << dendl; >>>>> snap_mapper.add_oid( >>>>> recovery_info.soid, >>>>> >>>>> >>>>>> Stefan >>>>>> >>>>>>> s >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>> [0x55addb30748f] >>>>>>>> 5: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>> [0x55addb317531] >>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x55addb23cf10] >>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x55addb035bc7] >>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x55addb5f0e7d] >>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>> list mapping. >>>>>>>>> >>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>> happens. >>>>>>>>> >>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> > >>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>> being down. >>>>>>>>>>>> >>>>>>>>>>>> All of them fail with: >>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>> 7: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>> >>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>> >>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>> them myself. >>>>>>>>>>> -Greg >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OK hopefully last one. If i unset nodeep-scrub it immediatly starts around 19-22 deep scrubs which then results in massive slow requests and even OSDs marked as down. is there any way to limit this? Greets, Stefan Am 18.01.2018 um 15:24 schrieb Sage Weil: > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi, >>>> >>>> it also crashes in (marked with HERE): >>>> >>>> int SnapMapper::get_snaps( >>>> const hobject_t &oid, >>>> object_snaps *out) >>>> { >>>> assert(check(oid)); >>>> set<string> keys; >>>> map<string, bufferlist> got; >>>> keys.insert(to_object_key(oid)); >>>> int r = backend.get_keys(keys, &got); >>>> if (r < 0) >>>> return r; >>>> if (got.empty()) >>>> return -ENOENT; >>>> if (out) { >>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>> ::decode(*out, bp); >>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>> assert(!out->snaps.empty()); ########### HERE ########### >>>> } else { >>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>> } >>>> return 0; >>>> } >>>> >>>> is it save to comment that assert? >>> >>> I think so. All of these asserts are just about ensuring the snapmapper >>> state is consistent, and it's only purpose is to find snaps to trim. >>> Since your snapmapper metadata is clearly not consistent, there isn't a >>> whole lot of risk here. You might want to set nosnaptrim for the time >>> being so you don't get a surprise if trimming kicks in. The next step is >>> probably going to be to do a scrub and see what it finds, repairs, or >>> can't repair. >> >> snap trimming works fine i already trimmed and removed some snaps. >> >> I was able to get the cluster into a state where all is backfilled. But >> enabling / doing deep-scrubs results into this one: >> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >> /build/ceph/src/osd/SnapMapper.cc: In function 'int >> SnapMapper::get_snaps(const h >> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >> 2018-01-18 13:00:54.840396 >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > > I think if you switch that assert to a warning it will repair... > > sage > >> >> ceph version 12.2.2-94-g92923ef >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x5561ce28d1f2] >> 2: (SnapMapper::get_snaps(hobject_t const&, >> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >> 697] >> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >> [0x5561cdcd7bc7] >> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x5561cdf4f957] >> 10: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >> [0x5561ce292e7d] >> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >> 13: (()+0x8064) [0x7fbd70d86064] >> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> >> Greets, >> Stefan >> >>> sage >>> >>> >>>> Stefan >>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>> >>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>> PrimaryLogPG::on_l >>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>> >>>>>>> Is this a cache tiering pool? >>>>>> >>>>>> no normal 3 replica but the pg is degraded: >>>>>> ceph pg dump | grep 3.80e >>>>>> >>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>> >>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>> around the current assert: >>>>> >>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>> index d42f3a401b..0f76134f74 100644 >>>>> --- a/src/osd/PrimaryLogPG.cc >>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>> set<snapid_t> snaps; >>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>> + } else { >>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>> + } >>>>> dout(20) << " snaps " << snaps << dendl; >>>>> snap_mapper.add_oid( >>>>> recovery_info.soid, >>>>> >>>>> >>>>>> >>>>>> Stefan >>>>>> >>>>>>> >>>>>>> s >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>> [0x55addb30748f] >>>>>>>> 5: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>> [0x55addb317531] >>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x55addb23cf10] >>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x55addb035bc7] >>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x55addb5f0e7d] >>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>> >>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>> list mapping. >>>>>>>>> >>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>> happens. >>>>>>>>> >>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> > >>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>> being down. >>>>>>>>>>>> >>>>>>>>>>>> All of them fail with: >>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>> 7: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>> >>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>> >>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>> >>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>> them myself. >>>>>>>>>>> -Greg >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mhm it doesn't seem to have something todo with deep scrubs and disk i/o even in scrubs produce the following log and signal (there were no slow requests or other impacts it just got aborted after the monitor set it to down): -20> 2018-01-18 23:05:45.717224 7fb955fff700 0 -- 10.255.0.88:6801/18362 >> 10.255.0.92:6813/25732 conn(0x7fb8f5d90000 :-1 s=STA TE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=48 cs=1 l=0).handle_connect_reply connect got RESETSESSION -19> 2018-01-18 23:05:45.717289 7fb955fff700 0 -- 10.255.0.88:6801/18362 >> 10.255.0.86:6813/29823 conn(0x7fb91e668000 :6801 s=S TATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=105 cs=1 l=0).handle_connect_reply connect got RESETSESSION -18> 2018-01-18 23:05:48.021476 7fb92a3f6700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.16 down, but it is sti ll running -17> 2018-01-18 23:05:48.021484 7fb92a3f6700 0 log_channel(cluster) log [DBG] : map e924379 wrongly marked me down at e924378 0> 2018-01-18 23:07:53.251128 7fb90cbff700 -1 *** Caught signal (Aborted) ** in thread 7fb90cbff700 thread_name:tp_osd_tp ceph version 12.2.2-94-g92923ef (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) 1: (()+0xa4433c) [0x55fa1e2c033c] 2: (()+0xf890) [0x7fb959c00890] 3: (pthread_cond_wait()+0xbf) [0x7fb959bfd04f] 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, Context*)+0x254) [0x55fa1e07fd54] 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*)+0x5c) [0x55fa1ddaeb2c] 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x55fa1de00fa7] 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x22f) [0x55fa1de01c7f] 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x624) [0x55fa1de025a4] 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x8b6) [0x55fa1dec1766] 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x55fa1dd4dbc7] 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55fa1dfc5eb7] 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x55fa1dd7cd1c] 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x55fa1e3093bd] 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fa1e30b380] 15: (()+0x8064) [0x7fb959bf9064] 16: (clone()+0x6d) [0x7fb958ced62d] Greets, Stefan Am 18.01.2018 um 15:24 schrieb Sage Weil: > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi, >>>> >>>> it also crashes in (marked with HERE): >>>> >>>> int SnapMapper::get_snaps( >>>> const hobject_t &oid, >>>> object_snaps *out) >>>> { >>>> assert(check(oid)); >>>> set<string> keys; >>>> map<string, bufferlist> got; >>>> keys.insert(to_object_key(oid)); >>>> int r = backend.get_keys(keys, &got); >>>> if (r < 0) >>>> return r; >>>> if (got.empty()) >>>> return -ENOENT; >>>> if (out) { >>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>> ::decode(*out, bp); >>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>> assert(!out->snaps.empty()); ########### HERE ########### >>>> } else { >>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>> } >>>> return 0; >>>> } >>>> >>>> is it save to comment that assert? >>> >>> I think so. All of these asserts are just about ensuring the snapmapper >>> state is consistent, and it's only purpose is to find snaps to trim. >>> Since your snapmapper metadata is clearly not consistent, there isn't a >>> whole lot of risk here. You might want to set nosnaptrim for the time >>> being so you don't get a surprise if trimming kicks in. The next step is >>> probably going to be to do a scrub and see what it finds, repairs, or >>> can't repair. >> >> snap trimming works fine i already trimmed and removed some snaps. >> >> I was able to get the cluster into a state where all is backfilled. But >> enabling / doing deep-scrubs results into this one: >> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >> /build/ceph/src/osd/SnapMapper.cc: In function 'int >> SnapMapper::get_snaps(const h >> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >> 2018-01-18 13:00:54.840396 >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > > I think if you switch that assert to a warning it will repair... > > sage > >> >> ceph version 12.2.2-94-g92923ef >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x5561ce28d1f2] >> 2: (SnapMapper::get_snaps(hobject_t const&, >> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >> 697] >> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >> [0x5561cdcd7bc7] >> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x5561cdf4f957] >> 10: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >> [0x5561ce292e7d] >> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >> 13: (()+0x8064) [0x7fbd70d86064] >> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> >> Greets, >> Stefan >> >>> sage >>> >>> >>>> Stefan >>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>> >>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>> PrimaryLogPG::on_l >>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>> >>>>>>> Is this a cache tiering pool? >>>>>> >>>>>> no normal 3 replica but the pg is degraded: >>>>>> ceph pg dump | grep 3.80e >>>>>> >>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>> >>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>> around the current assert: >>>>> >>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>> index d42f3a401b..0f76134f74 100644 >>>>> --- a/src/osd/PrimaryLogPG.cc >>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>> set<snapid_t> snaps; >>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>> + } else { >>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>> + } >>>>> dout(20) << " snaps " << snaps << dendl; >>>>> snap_mapper.add_oid( >>>>> recovery_info.soid, >>>>> >>>>> >>>>>> >>>>>> Stefan >>>>>> >>>>>>> >>>>>>> s >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>> [0x55addb30748f] >>>>>>>> 5: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>> [0x55addb317531] >>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x55addb23cf10] >>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x55addb035bc7] >>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x55addb5f0e7d] >>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>> >>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>> list mapping. >>>>>>>>> >>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>> happens. >>>>>>>>> >>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> > >>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>> being down. >>>>>>>>>>>> >>>>>>>>>>>> All of them fail with: >>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>> 7: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>> >>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>> >>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>> >>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>> them myself. >>>>>>>>>>> -Greg >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
HI Sage, any ideas for this one sawing while doing a deep scrub: -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) ceph version 12.2.2-94-g92923ef (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55c0100d0372] 2: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] 3: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t> const&, eversion_t const&, eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) [0x55c00fdf6642] 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) [0x55c00fdfc684] 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55c00fd22030] 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x55c00fb1abc7] 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55c00fd92ad7] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x55c0100d5ffd] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] 14: (()+0x8064) [0x7fbd284fb064] 15: (clone()+0x6d) [0x7fbd275ef62d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Stefan Am 18.01.2018 um 15:24 schrieb Sage Weil: > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi, >>>> >>>> it also crashes in (marked with HERE): >>>> >>>> int SnapMapper::get_snaps( >>>> const hobject_t &oid, >>>> object_snaps *out) >>>> { >>>> assert(check(oid)); >>>> set<string> keys; >>>> map<string, bufferlist> got; >>>> keys.insert(to_object_key(oid)); >>>> int r = backend.get_keys(keys, &got); >>>> if (r < 0) >>>> return r; >>>> if (got.empty()) >>>> return -ENOENT; >>>> if (out) { >>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>> ::decode(*out, bp); >>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>> assert(!out->snaps.empty()); ########### HERE ########### >>>> } else { >>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>> } >>>> return 0; >>>> } >>>> >>>> is it save to comment that assert? >>> >>> I think so. All of these asserts are just about ensuring the snapmapper >>> state is consistent, and it's only purpose is to find snaps to trim. >>> Since your snapmapper metadata is clearly not consistent, there isn't a >>> whole lot of risk here. You might want to set nosnaptrim for the time >>> being so you don't get a surprise if trimming kicks in. The next step is >>> probably going to be to do a scrub and see what it finds, repairs, or >>> can't repair. >> >> snap trimming works fine i already trimmed and removed some snaps. >> >> I was able to get the cluster into a state where all is backfilled. But >> enabling / doing deep-scrubs results into this one: >> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >> /build/ceph/src/osd/SnapMapper.cc: In function 'int >> SnapMapper::get_snaps(const h >> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >> 2018-01-18 13:00:54.840396 >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > > I think if you switch that assert to a warning it will repair... > > sage > >> >> ceph version 12.2.2-94-g92923ef >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x5561ce28d1f2] >> 2: (SnapMapper::get_snaps(hobject_t const&, >> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >> 697] >> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >> [0x5561cdcd7bc7] >> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x5561cdf4f957] >> 10: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >> [0x5561ce292e7d] >> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >> 13: (()+0x8064) [0x7fbd70d86064] >> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> >> Greets, >> Stefan >> >>> sage >>> >>> >>>> Stefan >>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>> >>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>> PrimaryLogPG::on_l >>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>> >>>>>>> Is this a cache tiering pool? >>>>>> >>>>>> no normal 3 replica but the pg is degraded: >>>>>> ceph pg dump | grep 3.80e >>>>>> >>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>> >>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>> around the current assert: >>>>> >>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>> index d42f3a401b..0f76134f74 100644 >>>>> --- a/src/osd/PrimaryLogPG.cc >>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>> set<snapid_t> snaps; >>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>> + } else { >>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>> + } >>>>> dout(20) << " snaps " << snaps << dendl; >>>>> snap_mapper.add_oid( >>>>> recovery_info.soid, >>>>> >>>>> >>>>>> >>>>>> Stefan >>>>>> >>>>>>> >>>>>>> s >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>> [0x55addb30748f] >>>>>>>> 5: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>> [0x55addb317531] >>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x55addb23cf10] >>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x55addb035bc7] >>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x55addb5f0e7d] >>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>> >>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>> list mapping. >>>>>>>>> >>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>> happens. >>>>>>>>> >>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> > >>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>> being down. >>>>>>>>>>>> >>>>>>>>>>>> All of them fail with: >>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>> 7: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>> >>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>> >>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>> >>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>> them myself. >>>>>>>>>>> -Greg >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I'm guessing it is coming from object_snaps out; int r = get_snaps(oid, &out); if (r < 0) return r; in SnapMapper::update_snaps(). I would add a debug statement to to confirm that, and possibly also add debug lines to the various return points in get_snaps() so you can see why get_snaps is returning an error. I'm guessing the way to proceed is to warn but ignore the error, but it's hard to say without seeing what the inconsistency there is and whether we can tolerate it... sage On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > HI Sage, > > any ideas for this one sawing while doing a deep scrub: > > -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > > 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > > ceph version 12.2.2-94-g92923ef > (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x55c0100d0372] > 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > std::allocator<pg_log_entry_t> > const&, > anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > 3: (PG::append_log(std::vector<pg_log_entry_t, > std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > std::allocator<pg_log_entry_t> > const&, > boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > [0x55c00fdf6642] > 6: > (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > [0x55c00fdfc684] > 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > [0x55c00fd22030] > 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > [0x55c00fb1abc7] > 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > const&)+0x57) [0x55c00fd92ad7] > 11: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > [0x55c0100d5ffd] > 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > 14: (()+0x8064) [0x7fbd284fb064] > 15: (clone()+0x6d) [0x7fbd275ef62d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > > Stefan > > Am 18.01.2018 um 15:24 schrieb Sage Weil: > > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> Hi Sage, > >> > >> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi, > >>>> > >>>> it also crashes in (marked with HERE): > >>>> > >>>> int SnapMapper::get_snaps( > >>>> const hobject_t &oid, > >>>> object_snaps *out) > >>>> { > >>>> assert(check(oid)); > >>>> set<string> keys; > >>>> map<string, bufferlist> got; > >>>> keys.insert(to_object_key(oid)); > >>>> int r = backend.get_keys(keys, &got); > >>>> if (r < 0) > >>>> return r; > >>>> if (got.empty()) > >>>> return -ENOENT; > >>>> if (out) { > >>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>> ::decode(*out, bp); > >>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>> } else { > >>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>> } > >>>> return 0; > >>>> } > >>>> > >>>> is it save to comment that assert? > >>> > >>> I think so. All of these asserts are just about ensuring the snapmapper > >>> state is consistent, and it's only purpose is to find snaps to trim. > >>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>> whole lot of risk here. You might want to set nosnaptrim for the time > >>> being so you don't get a surprise if trimming kicks in. The next step is > >>> probably going to be to do a scrub and see what it finds, repairs, or > >>> can't repair. > >> > >> snap trimming works fine i already trimmed and removed some snaps. > >> > >> I was able to get the cluster into a state where all is backfilled. But > >> enabling / doing deep-scrubs results into this one: > >> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >> SnapMapper::get_snaps(const h > >> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >> 2018-01-18 13:00:54.840396 > >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > > > > I think if you switch that assert to a warning it will repair... > > > > sage > > > >> > >> ceph version 12.2.2-94-g92923ef > >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x102) [0x5561ce28d1f2] > >> 2: (SnapMapper::get_snaps(hobject_t const&, > >> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >> 697] > >> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >> [0x5561cdcd7bc7] > >> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >> const&)+0x57) [0x5561cdf4f957] > >> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >> [0x5561ce292e7d] > >> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >> 13: (()+0x8064) [0x7fbd70d86064] > >> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >> needed to interpret this. > >> > >> > >> Greets, > >> Stefan > >> > >>> sage > >>> > >>> > >>>> Stefan > >>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> > >>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> Hi Sage, > >>>>>>>> > >>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>> > >>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>> PrimaryLogPG::on_l > >>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>> > >>>>>>> Is this a cache tiering pool? > >>>>>> > >>>>>> no normal 3 replica but the pg is degraded: > >>>>>> ceph pg dump | grep 3.80e > >>>>>> > >>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>> > >>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>> around the current assert: > >>>>> > >>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>> index d42f3a401b..0f76134f74 100644 > >>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>> set<snapid_t> snaps; > >>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>> + } else { > >>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>> + } > >>>>> dout(20) << " snaps " << snaps << dendl; > >>>>> snap_mapper.add_oid( > >>>>> recovery_info.soid, > >>>>> > >>>>> > >>>>>> > >>>>>> Stefan > >>>>>> > >>>>>>> > >>>>>>> s > >>>>>>>> > >>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>> [0x55addb30748f] > >>>>>>>> 5: > >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>> [0x55addb317531] > >>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>> [0x55addb23cf10] > >>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>> [0x55addb035bc7] > >>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>> [0x55addb5f0e7d] > >>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>> needed to interpret this. > >>>>>>>> > >>>>>>>> Greets, > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>> > >>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>> list mapping. > >>>>>>>>> > >>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>> happens. > >>>>>>>>> > >>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>> > >>>>>>>>> sage > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > > >>>>>>>>> > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>> Hello, > >>>>>>>>>>>> > >>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>> being down. > >>>>>>>>>>>> > >>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>> > >>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>> 7: > >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>> > >>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>> > >>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>> > >>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>> them myself. > >>>>>>>>>>> -Greg > >>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>> > >>>> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, thanks for your reply. I'll do this. What i still don't understand is the following and i hope you might have an idea: 1.) All the snaps mentioned here are fresh - i created them today running luminous? How can they be missing in snapmapper again? 2.) How did this happen? The cluster was jewel and was updated to luminous when all this happened. Greets, Stefan Am 19.01.2018 um 21:19 schrieb Sage Weil: > I'm guessing it is coming from > > object_snaps out; > int r = get_snaps(oid, &out); > if (r < 0) > return r; > > in SnapMapper::update_snaps(). I would add a debug statement to to > confirm that, and possibly also add debug lines to the various return > points in get_snaps() so you can see why get_snaps is returning an error. > > I'm guessing the way to proceed is to warn but ignore the error, but it's > hard to say without seeing what the inconsistency there is and whether we > can tolerate it... > > sage > > > On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> HI Sage, >> >> any ideas for this one sawing while doing a deep scrub: >> >> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >> >> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >> >> ceph version 12.2.2-94-g92923ef >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x55c0100d0372] >> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >> std::allocator<pg_log_entry_t> > const&, >> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >> 3: (PG::append_log(std::vector<pg_log_entry_t, >> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >> std::allocator<pg_log_entry_t> > const&, >> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >> [0x55c00fdf6642] >> 6: >> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >> [0x55c00fdfc684] >> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >> [0x55c00fd22030] >> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >> [0x55c00fb1abc7] >> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x55c00fd92ad7] >> 11: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >> [0x55c0100d5ffd] >> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >> 14: (()+0x8064) [0x7fbd284fb064] >> 15: (clone()+0x6d) [0x7fbd275ef62d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> >> Stefan >> >> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi, >>>>>> >>>>>> it also crashes in (marked with HERE): >>>>>> >>>>>> int SnapMapper::get_snaps( >>>>>> const hobject_t &oid, >>>>>> object_snaps *out) >>>>>> { >>>>>> assert(check(oid)); >>>>>> set<string> keys; >>>>>> map<string, bufferlist> got; >>>>>> keys.insert(to_object_key(oid)); >>>>>> int r = backend.get_keys(keys, &got); >>>>>> if (r < 0) >>>>>> return r; >>>>>> if (got.empty()) >>>>>> return -ENOENT; >>>>>> if (out) { >>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>> ::decode(*out, bp); >>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>> } else { >>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>> } >>>>>> return 0; >>>>>> } >>>>>> >>>>>> is it save to comment that assert? >>>>> >>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>> can't repair. >>>> >>>> snap trimming works fine i already trimmed and removed some snaps. >>>> >>>> I was able to get the cluster into a state where all is backfilled. But >>>> enabling / doing deep-scrubs results into this one: >>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>> SnapMapper::get_snaps(const h >>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>> 2018-01-18 13:00:54.840396 >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>> >>> I think if you switch that assert to a warning it will repair... >>> >>> sage >>> >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x5561ce28d1f2] >>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>> 697] >>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x5561cdcd7bc7] >>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x5561cdf4f957] >>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x5561ce292e7d] >>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>> 13: (()+0x8064) [0x7fbd70d86064] >>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> >>>> Greets, >>>> Stefan >>>> >>>>> sage >>>>> >>>>> >>>>>> Stefan >>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> >>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>> >>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>> >>>>>>>>> Is this a cache tiering pool? >>>>>>>> >>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>> ceph pg dump | grep 3.80e >>>>>>>> >>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>> >>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>> around the current assert: >>>>>>> >>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>> set<snapid_t> snaps; >>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>> + } else { >>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>> + } >>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>> snap_mapper.add_oid( >>>>>>> recovery_info.soid, >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>>> >>>>>>>>> s >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>> [0x55addb30748f] >>>>>>>>>> 5: >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>> [0x55addb317531] >>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>> [0x55addb23cf10] >>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x55addb035bc7] >>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>> >>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>> list mapping. >>>>>>>>>>> >>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>> happens. >>>>>>>>>>> >>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>> being down. >>>>>>>>>>>>>> >>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>> 7: >>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>> >>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>> >>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>> >>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>> them myself. >>>>>>>>>>>>> -Greg >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > thanks for your reply. I'll do this. What i still don't understand is > the following and i hope you might have an idea: > 1.) All the snaps mentioned here are fresh - i created them today > running luminous? How can they be missing in snapmapper again? I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so that when you hit the crash we have some context as to what is going on. Did you track down which part of get_snaps() is returning the error? > 2.) How did this happen? The cluster was jewel and was updated to > luminous when all this happened. Did this start right after the upgrade? I started a PR that relaxes some of these assertions so that they clean up instead (unless the debug option is enabled, for our regression testing), but I'm still unclear about what the inconsistency is... any loggin you can provide would help! Thanks- sage > > Greets, > Stefan > > > Am 19.01.2018 um 21:19 schrieb Sage Weil: > > I'm guessing it is coming from > > > > object_snaps out; > > int r = get_snaps(oid, &out); > > if (r < 0) > > return r; > > > > in SnapMapper::update_snaps(). I would add a debug statement to to > > confirm that, and possibly also add debug lines to the various return > > points in get_snaps() so you can see why get_snaps is returning an error. > > > > I'm guessing the way to proceed is to warn but ignore the error, but it's > > hard to say without seeing what the inconsistency there is and whether we > > can tolerate it... > > > > sage > > > > > > On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > > > >> HI Sage, > >> > >> any ideas for this one sawing while doing a deep scrub: > >> > >> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >> > >> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >> > >> ceph version 12.2.2-94-g92923ef > >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x102) [0x55c0100d0372] > >> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >> std::allocator<pg_log_entry_t> > const&, > >> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >> 3: (PG::append_log(std::vector<pg_log_entry_t, > >> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >> std::allocator<pg_log_entry_t> > const&, > >> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >> [0x55c00fdf6642] > >> 6: > >> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >> [0x55c00fdfc684] > >> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >> [0x55c00fd22030] > >> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >> [0x55c00fb1abc7] > >> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >> const&)+0x57) [0x55c00fd92ad7] > >> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >> [0x55c0100d5ffd] > >> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >> 14: (()+0x8064) [0x7fbd284fb064] > >> 15: (clone()+0x6d) [0x7fbd275ef62d] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >> needed to interpret this. > >> > >> > >> Stefan > >> > >> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi, > >>>>>> > >>>>>> it also crashes in (marked with HERE): > >>>>>> > >>>>>> int SnapMapper::get_snaps( > >>>>>> const hobject_t &oid, > >>>>>> object_snaps *out) > >>>>>> { > >>>>>> assert(check(oid)); > >>>>>> set<string> keys; > >>>>>> map<string, bufferlist> got; > >>>>>> keys.insert(to_object_key(oid)); > >>>>>> int r = backend.get_keys(keys, &got); > >>>>>> if (r < 0) > >>>>>> return r; > >>>>>> if (got.empty()) > >>>>>> return -ENOENT; > >>>>>> if (out) { > >>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>> ::decode(*out, bp); > >>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>> } else { > >>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>> } > >>>>>> return 0; > >>>>>> } > >>>>>> > >>>>>> is it save to comment that assert? > >>>>> > >>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>> can't repair. > >>>> > >>>> snap trimming works fine i already trimmed and removed some snaps. > >>>> > >>>> I was able to get the cluster into a state where all is backfilled. But > >>>> enabling / doing deep-scrubs results into this one: > >>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>> SnapMapper::get_snaps(const h > >>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>> 2018-01-18 13:00:54.840396 > >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>> > >>> I think if you switch that assert to a warning it will repair... > >>> > >>> sage > >>> > >>>> > >>>> ceph version 12.2.2-94-g92923ef > >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>> const*)+0x102) [0x5561ce28d1f2] > >>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>> 697] > >>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>> [0x5561cdcd7bc7] > >>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>> const&)+0x57) [0x5561cdf4f957] > >>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>> [0x5561ce292e7d] > >>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>> needed to interpret this. > >>>> > >>>> > >>>> Greets, > >>>> Stefan > >>>> > >>>>> sage > >>>>> > >>>>> > >>>>>> Stefan > >>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> > >>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> Hi Sage, > >>>>>>>>>> > >>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>> > >>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>> > >>>>>>>>> Is this a cache tiering pool? > >>>>>>>> > >>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>> > >>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>> > >>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>> around the current assert: > >>>>>>> > >>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>> set<snapid_t> snaps; > >>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>> + } else { > >>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>> + } > >>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>> snap_mapper.add_oid( > >>>>>>> recovery_info.soid, > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>>> > >>>>>>>>> s > >>>>>>>>>> > >>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>> [0x55addb30748f] > >>>>>>>>>> 5: > >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>> [0x55addb317531] > >>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>> needed to interpret this. > >>>>>>>>>> > >>>>>>>>>> Greets, > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>> > >>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>> list mapping. > >>>>>>>>>>> > >>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>> happens. > >>>>>>>>>>> > >>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>> > >>>>>>>>>>> sage > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > > >>>>>>>>>>> > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> > >>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>> > >>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>> > >>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>> them myself. > >>>>>>>>>>>>> -Greg > >>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>> > >>>> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, get_snaps returned the error as there are no snaps in the snapmapper. So it's OK to skip the assert. My biggest problem is now that i'm not able to run a scrub on all PGs to fix all those errors because in around 1-3% the osd seem to deadlock and needs to get killed to proceed. The log output is always: -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) log [WRN] : slow request 141.951820 seconds old, received at 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 00000157:head v 927921'526559757) currently commit_sent -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) log [WRN] : slow request 141.643853 seconds old, received at 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 00000147d:head v 927921'29105383) currently commit_sent -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) log [WRN] : slow request 141.313220 seconds old, received at 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 000000aff:head v 927921'164552063) currently commit_sent 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal (Aborted) ** in thread 7f0fc23fe700 thread_name:tp_osd_tp ceph version 12.2.2-94-g92923ef (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) 1: (()+0xa4423c) [0x561d876ab23c] 2: (()+0xf890) [0x7f100b974890] 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran saction> >&, Context*)+0x254) [0x561d8746ac54] 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x624) [0x561d871ed574] 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x561d87138bc7] 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x561d873b0db7] 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x561d876f42bd] 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] 15: (()+0x8064) [0x7f100b96d064] 16: (clone()+0x6d) [0x7f100aa6162d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 0/ 0 mds 0/ 0 mds_balancer 0/ 0 mds_locker 0/ 0 mds_log 0/ 0 mds_log_expire 0/ 0 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 0 journaler 0/ 0 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 1/ 1 reserver 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/10 civetweb 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.59.log --- end dump of recent events --- Greets, Stefan Am 21.01.2018 um 21:27 schrieb Sage Weil: > On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> thanks for your reply. I'll do this. What i still don't understand is >> the following and i hope you might have an idea: >> 1.) All the snaps mentioned here are fresh - i created them today >> running luminous? How can they be missing in snapmapper again? > > I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > that when you hit the crash we have some context as to what is going on. > > Did you track down which part of get_snaps() is returning the error? > >> 2.) How did this happen? The cluster was jewel and was updated to >> luminous when all this happened. > > Did this start right after the upgrade? > > I started a PR that relaxes some of these assertions so that they clean up > instead (unless the debug option is enabled, for our regression testing), > but I'm still unclear about what the inconsistency is... any loggin you > can provide would help! > > Thanks- > sage > > > >> >> Greets, >> Stefan >> >> >> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>> I'm guessing it is coming from >>> >>> object_snaps out; >>> int r = get_snaps(oid, &out); >>> if (r < 0) >>> return r; >>> >>> in SnapMapper::update_snaps(). I would add a debug statement to to >>> confirm that, and possibly also add debug lines to the various return >>> points in get_snaps() so you can see why get_snaps is returning an error. >>> >>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>> hard to say without seeing what the inconsistency there is and whether we >>> can tolerate it... >>> >>> sage >>> >>> >>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>> >>>> HI Sage, >>>> >>>> any ideas for this one sawing while doing a deep scrub: >>>> >>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>> >>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x55c0100d0372] >>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>> std::allocator<pg_log_entry_t> > const&, >>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>> std::allocator<pg_log_entry_t> > const&, >>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>> [0x55c00fdf6642] >>>> 6: >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>> [0x55c00fdfc684] >>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>> [0x55c00fd22030] >>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x55c00fb1abc7] >>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x55c00fd92ad7] >>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x55c0100d5ffd] >>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>> 14: (()+0x8064) [0x7fbd284fb064] >>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> >>>> Stefan >>>> >>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> it also crashes in (marked with HERE): >>>>>>>> >>>>>>>> int SnapMapper::get_snaps( >>>>>>>> const hobject_t &oid, >>>>>>>> object_snaps *out) >>>>>>>> { >>>>>>>> assert(check(oid)); >>>>>>>> set<string> keys; >>>>>>>> map<string, bufferlist> got; >>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>> if (r < 0) >>>>>>>> return r; >>>>>>>> if (got.empty()) >>>>>>>> return -ENOENT; >>>>>>>> if (out) { >>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>> ::decode(*out, bp); >>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>> } else { >>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>> } >>>>>>>> return 0; >>>>>>>> } >>>>>>>> >>>>>>>> is it save to comment that assert? >>>>>>> >>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>> can't repair. >>>>>> >>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>> >>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>> enabling / doing deep-scrubs results into this one: >>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>> SnapMapper::get_snaps(const h >>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>> 2018-01-18 13:00:54.840396 >>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>> >>>>> I think if you switch that assert to a warning it will repair... >>>>> >>>>> sage >>>>> >>>>>> >>>>>> ceph version 12.2.2-94-g92923ef >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>> 697] >>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>> [0x5561cdcd7bc7] >>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>> [0x5561ce292e7d] >>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>> needed to interpret this. >>>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>>> sage >>>>>>> >>>>>>> >>>>>>>> Stefan >>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> >>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi Sage, >>>>>>>>>>>> >>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>> >>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>> >>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>> >>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>> >>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>> >>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>> around the current assert: >>>>>>>>> >>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>> set<snapid_t> snaps; >>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>> + } else { >>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>> + } >>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>> snap_mapper.add_oid( >>>>>>>>> recovery_info.soid, >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> s >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>> 5: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> Greets, >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>> >>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>> list mapping. >>>>>>>>>>>>> >>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>> happens. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > get_snaps returned the error as there are no snaps in the snapmapper. So > it's OK to skip the assert. > > My biggest problem is now that i'm not able to run a scrub on all PGs to > fix all those errors because in around 1-3% the osd seem to deadlock and > needs to get killed to proceed. I believe this will fix it: https://github.com/liewegas/ceph/commit/wip-snapmapper Cherry-pick that and let me know? Thanks! sage > > The log output is always: > -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > log [WRN] : slow request 141.951820 seconds old, received at > 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > 00000157:head v 927921'526559757) currently commit_sent > -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > log [WRN] : slow request 141.643853 seconds old, received at > 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > 00000147d:head v 927921'29105383) currently commit_sent > -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > log [WRN] : slow request 141.313220 seconds old, received at > 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > 000000aff:head v 927921'164552063) currently commit_sent > 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > (Aborted) ** > in thread 7f0fc23fe700 thread_name:tp_osd_tp > > ceph version 12.2.2-94-g92923ef > (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > 1: (()+0xa4423c) [0x561d876ab23c] > 2: (()+0xf890) [0x7f100b974890] > 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > saction> >&, Context*)+0x254) [0x561d8746ac54] > 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > [0x561d87138bc7] > 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > const&)+0x57) [0x561d873b0db7] > 12: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > [0x561d876f42bd] > 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > 15: (()+0x8064) [0x7f100b96d064] > 16: (clone()+0x6d) [0x7f100aa6162d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 0 lockdep > 0/ 0 context > 0/ 0 crush > 0/ 0 mds > 0/ 0 mds_balancer > 0/ 0 mds_locker > 0/ 0 mds_log > 0/ 0 mds_log_expire > 0/ 0 mds_migrator > 0/ 0 buffer > 0/ 0 timer > 0/ 0 filer > 0/ 1 striper > 0/ 0 objecter > 0/ 0 rados > 0/ 0 rbd > 0/ 5 rbd_mirror > 0/ 5 rbd_replay > 0/ 0 journaler > 0/ 0 objectcacher > 0/ 0 client > 0/ 0 osd > 0/ 0 optracker > 0/ 0 objclass > 0/ 0 filestore > 0/ 0 journal > 0/ 0 ms > 0/ 0 mon > 0/ 0 monc > 0/ 0 paxos > 0/ 0 tp > 0/ 0 auth > 1/ 5 crypto > 0/ 0 finisher > 1/ 1 reserver > 0/ 0 heartbeatmap > 0/ 0 perfcounter > 0/ 0 rgw > 1/10 civetweb > 1/ 5 javaclient > 0/ 0 asok > 0/ 0 throttle > 0/ 0 refs > 1/ 5 xio > 1/ 5 compressor > 1/ 5 bluestore > 1/ 5 bluefs > 1/ 3 bdev > 1/ 5 kstore > 4/ 5 rocksdb > 4/ 5 leveldb > 4/ 5 memdb > 1/ 5 kinetic > 1/ 5 fuse > 1/ 5 mgr > 1/ 5 mgrc > 1/ 5 dpdk > 1/ 5 eventtrace > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.59.log > --- end dump of recent events --- > > > Greets, > Stefan > > Am 21.01.2018 um 21:27 schrieb Sage Weil: > > On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> Hi Sage, > >> > >> thanks for your reply. I'll do this. What i still don't understand is > >> the following and i hope you might have an idea: > >> 1.) All the snaps mentioned here are fresh - i created them today > >> running luminous? How can they be missing in snapmapper again? > > > > I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > > that when you hit the crash we have some context as to what is going on. > > > > Did you track down which part of get_snaps() is returning the error? > > > >> 2.) How did this happen? The cluster was jewel and was updated to > >> luminous when all this happened. > > > > Did this start right after the upgrade? > > > > I started a PR that relaxes some of these assertions so that they clean up > > instead (unless the debug option is enabled, for our regression testing), > > but I'm still unclear about what the inconsistency is... any loggin you > > can provide would help! > > > > Thanks- > > sage > > > > > > > >> > >> Greets, > >> Stefan > >> > >> > >> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>> I'm guessing it is coming from > >>> > >>> object_snaps out; > >>> int r = get_snaps(oid, &out); > >>> if (r < 0) > >>> return r; > >>> > >>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>> confirm that, and possibly also add debug lines to the various return > >>> points in get_snaps() so you can see why get_snaps is returning an error. > >>> > >>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>> hard to say without seeing what the inconsistency there is and whether we > >>> can tolerate it... > >>> > >>> sage > >>> > >>> > >>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>> > >>>> HI Sage, > >>>> > >>>> any ideas for this one sawing while doing a deep scrub: > >>>> > >>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>> > >>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>> > >>>> ceph version 12.2.2-94-g92923ef > >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>> const*)+0x102) [0x55c0100d0372] > >>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>> std::allocator<pg_log_entry_t> > const&, > >>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>> std::allocator<pg_log_entry_t> > const&, > >>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>> [0x55c00fdf6642] > >>>> 6: > >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>> [0x55c00fdfc684] > >>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>> [0x55c00fd22030] > >>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>> [0x55c00fb1abc7] > >>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>> const&)+0x57) [0x55c00fd92ad7] > >>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>> [0x55c0100d5ffd] > >>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>> needed to interpret this. > >>>> > >>>> > >>>> Stefan > >>>> > >>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi Sage, > >>>>>> > >>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> it also crashes in (marked with HERE): > >>>>>>>> > >>>>>>>> int SnapMapper::get_snaps( > >>>>>>>> const hobject_t &oid, > >>>>>>>> object_snaps *out) > >>>>>>>> { > >>>>>>>> assert(check(oid)); > >>>>>>>> set<string> keys; > >>>>>>>> map<string, bufferlist> got; > >>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>> if (r < 0) > >>>>>>>> return r; > >>>>>>>> if (got.empty()) > >>>>>>>> return -ENOENT; > >>>>>>>> if (out) { > >>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>> ::decode(*out, bp); > >>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>> } else { > >>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>> } > >>>>>>>> return 0; > >>>>>>>> } > >>>>>>>> > >>>>>>>> is it save to comment that assert? > >>>>>>> > >>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>> can't repair. > >>>>>> > >>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>> > >>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>> enabling / doing deep-scrubs results into this one: > >>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>> SnapMapper::get_snaps(const h > >>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>> 2018-01-18 13:00:54.840396 > >>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>> > >>>>> I think if you switch that assert to a warning it will repair... > >>>>> > >>>>> sage > >>>>> > >>>>>> > >>>>>> ceph version 12.2.2-94-g92923ef > >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>> 697] > >>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>> [0x5561cdcd7bc7] > >>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>> [0x5561ce292e7d] > >>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>> needed to interpret this. > >>>>>> > >>>>>> > >>>>>> Greets, > >>>>>> Stefan > >>>>>> > >>>>>>> sage > >>>>>>> > >>>>>>> > >>>>>>>> Stefan > >>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> > >>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>> > >>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>> > >>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>> > >>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>> > >>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>> > >>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>> > >>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>> around the current assert: > >>>>>>>>> > >>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>> set<snapid_t> snaps; > >>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>> + } else { > >>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>> + } > >>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>> snap_mapper.add_oid( > >>>>>>>>> recovery_info.soid, > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> s > >>>>>>>>>>>> > >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>> 5: > >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>> > >>>>>>>>>>>> Greets, > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> > >>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>> happens. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>> > >>>>>>>>>>>>> sage > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>> > >>>> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, thanks! That one fixed it! Now i'm just waiting if the log ever stops telling me: 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error on pg 3.33e oid 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in mapper: , oi: b0d7c...repaired Currently it seems it does an endless repair and never ends. Greets, Stefan Am 22.01.2018 um 15:30 schrieb Sage Weil: > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> get_snaps returned the error as there are no snaps in the snapmapper. So >> it's OK to skip the assert. >> >> My biggest problem is now that i'm not able to run a scrub on all PGs to >> fix all those errors because in around 1-3% the osd seem to deadlock and >> needs to get killed to proceed. > > I believe this will fix it: > > https://github.com/liewegas/ceph/commit/wip-snapmapper > > Cherry-pick that and let me know? > > Thanks! > sage > > >> >> The log output is always: >> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >> log [WRN] : slow request 141.951820 seconds old, received at >> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >> 00000157:head v 927921'526559757) currently commit_sent >> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >> log [WRN] : slow request 141.643853 seconds old, received at >> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >> 00000147d:head v 927921'29105383) currently commit_sent >> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >> log [WRN] : slow request 141.313220 seconds old, received at >> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >> 000000aff:head v 927921'164552063) currently commit_sent >> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >> (Aborted) ** >> in thread 7f0fc23fe700 thread_name:tp_osd_tp >> >> ceph version 12.2.2-94-g92923ef >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >> 1: (()+0xa4423c) [0x561d876ab23c] >> 2: (()+0xf890) [0x7f100b974890] >> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >> saction> >&, Context*)+0x254) [0x561d8746ac54] >> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >> [0x561d87138bc7] >> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x561d873b0db7] >> 12: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >> [0x561d876f42bd] >> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >> 15: (()+0x8064) [0x7f100b96d064] >> 16: (clone()+0x6d) [0x7f100aa6162d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> --- logging levels --- >> 0/ 5 none >> 0/ 0 lockdep >> 0/ 0 context >> 0/ 0 crush >> 0/ 0 mds >> 0/ 0 mds_balancer >> 0/ 0 mds_locker >> 0/ 0 mds_log >> 0/ 0 mds_log_expire >> 0/ 0 mds_migrator >> 0/ 0 buffer >> 0/ 0 timer >> 0/ 0 filer >> 0/ 1 striper >> 0/ 0 objecter >> 0/ 0 rados >> 0/ 0 rbd >> 0/ 5 rbd_mirror >> 0/ 5 rbd_replay >> 0/ 0 journaler >> 0/ 0 objectcacher >> 0/ 0 client >> 0/ 0 osd >> 0/ 0 optracker >> 0/ 0 objclass >> 0/ 0 filestore >> 0/ 0 journal >> 0/ 0 ms >> 0/ 0 mon >> 0/ 0 monc >> 0/ 0 paxos >> 0/ 0 tp >> 0/ 0 auth >> 1/ 5 crypto >> 0/ 0 finisher >> 1/ 1 reserver >> 0/ 0 heartbeatmap >> 0/ 0 perfcounter >> 0/ 0 rgw >> 1/10 civetweb >> 1/ 5 javaclient >> 0/ 0 asok >> 0/ 0 throttle >> 0/ 0 refs >> 1/ 5 xio >> 1/ 5 compressor >> 1/ 5 bluestore >> 1/ 5 bluefs >> 1/ 3 bdev >> 1/ 5 kstore >> 4/ 5 rocksdb >> 4/ 5 leveldb >> 4/ 5 memdb >> 1/ 5 kinetic >> 1/ 5 fuse >> 1/ 5 mgr >> 1/ 5 mgrc >> 1/ 5 dpdk >> 1/ 5 eventtrace >> -2/-2 (syslog threshold) >> -1/-1 (stderr threshold) >> max_recent 10000 >> max_new 1000 >> log_file /var/log/ceph/ceph-osd.59.log >> --- end dump of recent events --- >> >> >> Greets, >> Stefan >> >> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> thanks for your reply. I'll do this. What i still don't understand is >>>> the following and i hope you might have an idea: >>>> 1.) All the snaps mentioned here are fresh - i created them today >>>> running luminous? How can they be missing in snapmapper again? >>> >>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>> that when you hit the crash we have some context as to what is going on. >>> >>> Did you track down which part of get_snaps() is returning the error? >>> >>>> 2.) How did this happen? The cluster was jewel and was updated to >>>> luminous when all this happened. >>> >>> Did this start right after the upgrade? >>> >>> I started a PR that relaxes some of these assertions so that they clean up >>> instead (unless the debug option is enabled, for our regression testing), >>> but I'm still unclear about what the inconsistency is... any loggin you >>> can provide would help! >>> >>> Thanks- >>> sage >>> >>> >>> >>>> >>>> Greets, >>>> Stefan >>>> >>>> >>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>> I'm guessing it is coming from >>>>> >>>>> object_snaps out; >>>>> int r = get_snaps(oid, &out); >>>>> if (r < 0) >>>>> return r; >>>>> >>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>> confirm that, and possibly also add debug lines to the various return >>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>> >>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>> hard to say without seeing what the inconsistency there is and whether we >>>>> can tolerate it... >>>>> >>>>> sage >>>>> >>>>> >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>> >>>>>> HI Sage, >>>>>> >>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>> >>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>> >>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>> >>>>>> ceph version 12.2.2-94-g92923ef >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>> const*)+0x102) [0x55c0100d0372] >>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>> [0x55c00fdf6642] >>>>>> 6: >>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>> [0x55c00fdfc684] >>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>> [0x55c00fd22030] >>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>> [0x55c00fb1abc7] >>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>> [0x55c0100d5ffd] >>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>> needed to interpret this. >>>>>> >>>>>> >>>>>> Stefan >>>>>> >>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>> >>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>> const hobject_t &oid, >>>>>>>>>> object_snaps *out) >>>>>>>>>> { >>>>>>>>>> assert(check(oid)); >>>>>>>>>> set<string> keys; >>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>> if (r < 0) >>>>>>>>>> return r; >>>>>>>>>> if (got.empty()) >>>>>>>>>> return -ENOENT; >>>>>>>>>> if (out) { >>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>> ::decode(*out, bp); >>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>> } else { >>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>> } >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> is it save to comment that assert? >>>>>>>>> >>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>> can't repair. >>>>>>>> >>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>> >>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>> SnapMapper::get_snaps(const h >>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>> >>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>> >>>>>>> sage >>>>>>> >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>> 697] >>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x5561cdcd7bc7] >>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x5561ce292e7d] >>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> >>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>> >>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>> >>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>> >>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>> >>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>> >>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>> around the current assert: >>>>>>>>>>> >>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>> + } else { >>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>> + } >>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>> recovery_info.soid, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> s >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>> 5: >>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > thanks! That one fixed it! Now i'm just waiting if the log ever stops > telling me: > 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > on pg 3.33e oid > 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > mapper: , oi: b0d7c...repaired > > Currently it seems it does an endless repair and never ends. On many objects or the same object? sage > > Greets, > Stefan > Am 22.01.2018 um 15:30 schrieb Sage Weil: > > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> Hi Sage, > >> > >> get_snaps returned the error as there are no snaps in the snapmapper. So > >> it's OK to skip the assert. > >> > >> My biggest problem is now that i'm not able to run a scrub on all PGs to > >> fix all those errors because in around 1-3% the osd seem to deadlock and > >> needs to get killed to proceed. > > > > I believe this will fix it: > > > > https://github.com/liewegas/ceph/commit/wip-snapmapper > > > > Cherry-pick that and let me know? > > > > Thanks! > > sage > > > > > >> > >> The log output is always: > >> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >> log [WRN] : slow request 141.951820 seconds old, received at > >> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >> 00000157:head v 927921'526559757) currently commit_sent > >> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >> log [WRN] : slow request 141.643853 seconds old, received at > >> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >> 00000147d:head v 927921'29105383) currently commit_sent > >> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >> log [WRN] : slow request 141.313220 seconds old, received at > >> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >> 000000aff:head v 927921'164552063) currently commit_sent > >> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >> (Aborted) ** > >> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >> > >> ceph version 12.2.2-94-g92923ef > >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >> 1: (()+0xa4423c) [0x561d876ab23c] > >> 2: (()+0xf890) [0x7f100b974890] > >> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > >> saction> >&, Context*)+0x254) [0x561d8746ac54] > >> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >> [0x561d87138bc7] > >> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >> const&)+0x57) [0x561d873b0db7] > >> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >> [0x561d876f42bd] > >> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >> 15: (()+0x8064) [0x7f100b96d064] > >> 16: (clone()+0x6d) [0x7f100aa6162d] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >> needed to interpret this. > >> > >> --- logging levels --- > >> 0/ 5 none > >> 0/ 0 lockdep > >> 0/ 0 context > >> 0/ 0 crush > >> 0/ 0 mds > >> 0/ 0 mds_balancer > >> 0/ 0 mds_locker > >> 0/ 0 mds_log > >> 0/ 0 mds_log_expire > >> 0/ 0 mds_migrator > >> 0/ 0 buffer > >> 0/ 0 timer > >> 0/ 0 filer > >> 0/ 1 striper > >> 0/ 0 objecter > >> 0/ 0 rados > >> 0/ 0 rbd > >> 0/ 5 rbd_mirror > >> 0/ 5 rbd_replay > >> 0/ 0 journaler > >> 0/ 0 objectcacher > >> 0/ 0 client > >> 0/ 0 osd > >> 0/ 0 optracker > >> 0/ 0 objclass > >> 0/ 0 filestore > >> 0/ 0 journal > >> 0/ 0 ms > >> 0/ 0 mon > >> 0/ 0 monc > >> 0/ 0 paxos > >> 0/ 0 tp > >> 0/ 0 auth > >> 1/ 5 crypto > >> 0/ 0 finisher > >> 1/ 1 reserver > >> 0/ 0 heartbeatmap > >> 0/ 0 perfcounter > >> 0/ 0 rgw > >> 1/10 civetweb > >> 1/ 5 javaclient > >> 0/ 0 asok > >> 0/ 0 throttle > >> 0/ 0 refs > >> 1/ 5 xio > >> 1/ 5 compressor > >> 1/ 5 bluestore > >> 1/ 5 bluefs > >> 1/ 3 bdev > >> 1/ 5 kstore > >> 4/ 5 rocksdb > >> 4/ 5 leveldb > >> 4/ 5 memdb > >> 1/ 5 kinetic > >> 1/ 5 fuse > >> 1/ 5 mgr > >> 1/ 5 mgrc > >> 1/ 5 dpdk > >> 1/ 5 eventtrace > >> -2/-2 (syslog threshold) > >> -1/-1 (stderr threshold) > >> max_recent 10000 > >> max_new 1000 > >> log_file /var/log/ceph/ceph-osd.59.log > >> --- end dump of recent events --- > >> > >> > >> Greets, > >> Stefan > >> > >> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> thanks for your reply. I'll do this. What i still don't understand is > >>>> the following and i hope you might have an idea: > >>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>> running luminous? How can they be missing in snapmapper again? > >>> > >>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>> that when you hit the crash we have some context as to what is going on. > >>> > >>> Did you track down which part of get_snaps() is returning the error? > >>> > >>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>> luminous when all this happened. > >>> > >>> Did this start right after the upgrade? > >>> > >>> I started a PR that relaxes some of these assertions so that they clean up > >>> instead (unless the debug option is enabled, for our regression testing), > >>> but I'm still unclear about what the inconsistency is... any loggin you > >>> can provide would help! > >>> > >>> Thanks- > >>> sage > >>> > >>> > >>> > >>>> > >>>> Greets, > >>>> Stefan > >>>> > >>>> > >>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>> I'm guessing it is coming from > >>>>> > >>>>> object_snaps out; > >>>>> int r = get_snaps(oid, &out); > >>>>> if (r < 0) > >>>>> return r; > >>>>> > >>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>> confirm that, and possibly also add debug lines to the various return > >>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>> > >>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>> can tolerate it... > >>>>> > >>>>> sage > >>>>> > >>>>> > >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>> > >>>>>> HI Sage, > >>>>>> > >>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>> > >>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>> > >>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>> > >>>>>> ceph version 12.2.2-94-g92923ef > >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>> [0x55c00fdf6642] > >>>>>> 6: > >>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>> [0x55c00fdfc684] > >>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>> [0x55c00fd22030] > >>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>> [0x55c00fb1abc7] > >>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>> [0x55c0100d5ffd] > >>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>> needed to interpret this. > >>>>>> > >>>>>> > >>>>>> Stefan > >>>>>> > >>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> Hi Sage, > >>>>>>>> > >>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>> > >>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>> const hobject_t &oid, > >>>>>>>>>> object_snaps *out) > >>>>>>>>>> { > >>>>>>>>>> assert(check(oid)); > >>>>>>>>>> set<string> keys; > >>>>>>>>>> map<string, bufferlist> got; > >>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>> if (r < 0) > >>>>>>>>>> return r; > >>>>>>>>>> if (got.empty()) > >>>>>>>>>> return -ENOENT; > >>>>>>>>>> if (out) { > >>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>> } else { > >>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>> } > >>>>>>>>>> return 0; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> is it save to comment that assert? > >>>>>>>>> > >>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>> can't repair. > >>>>>>>> > >>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>> > >>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>> > >>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>> > >>>>>>> sage > >>>>>>> > >>>>>>>> > >>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>>>> 697] > >>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>> [0x5561cdcd7bc7] > >>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>> [0x5561ce292e7d] > >>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>> needed to interpret this. > >>>>>>>> > >>>>>>>> > >>>>>>>> Greets, > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>>> sage > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Stefan > >>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>> > >>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>> > >>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>> > >>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>> > >>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>> around the current assert: > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>> set<snapid_t> snaps; > >>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>> + } else { > >>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>> + } > >>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> s > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>> > >>>> > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 22.01.2018 um 20:01 schrieb Sage Weil: > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> thanks! That one fixed it! Now i'm just waiting if the log ever stops >> telling me: >> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >> on pg 3.33e oid >> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >> mapper: , oi: b0d7c...repaired >> >> Currently it seems it does an endless repair and never ends. > > On many objects or the same object? I did a manual scrub on all pgs but now it has started the automatic scrub and find those errors again. I haven't checked the object names yet. I'll wait 24 hours and see what's happening. Thanks again! Stefan > > sage > >> >> Greets, >> Stefan >> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>> it's OK to skip the assert. >>>> >>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>> needs to get killed to proceed. >>> >>> I believe this will fix it: >>> >>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>> >>> Cherry-pick that and let me know? >>> >>> Thanks! >>> sage >>> >>> >>>> >>>> The log output is always: >>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.951820 seconds old, received at >>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>> 00000157:head v 927921'526559757) currently commit_sent >>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.643853 seconds old, received at >>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>> 00000147d:head v 927921'29105383) currently commit_sent >>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.313220 seconds old, received at >>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>> 000000aff:head v 927921'164552063) currently commit_sent >>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>> (Aborted) ** >>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>> 2: (()+0xf890) [0x7f100b974890] >>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x561d87138bc7] >>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x561d873b0db7] >>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x561d876f42bd] >>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>> 15: (()+0x8064) [0x7f100b96d064] >>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> --- logging levels --- >>>> 0/ 5 none >>>> 0/ 0 lockdep >>>> 0/ 0 context >>>> 0/ 0 crush >>>> 0/ 0 mds >>>> 0/ 0 mds_balancer >>>> 0/ 0 mds_locker >>>> 0/ 0 mds_log >>>> 0/ 0 mds_log_expire >>>> 0/ 0 mds_migrator >>>> 0/ 0 buffer >>>> 0/ 0 timer >>>> 0/ 0 filer >>>> 0/ 1 striper >>>> 0/ 0 objecter >>>> 0/ 0 rados >>>> 0/ 0 rbd >>>> 0/ 5 rbd_mirror >>>> 0/ 5 rbd_replay >>>> 0/ 0 journaler >>>> 0/ 0 objectcacher >>>> 0/ 0 client >>>> 0/ 0 osd >>>> 0/ 0 optracker >>>> 0/ 0 objclass >>>> 0/ 0 filestore >>>> 0/ 0 journal >>>> 0/ 0 ms >>>> 0/ 0 mon >>>> 0/ 0 monc >>>> 0/ 0 paxos >>>> 0/ 0 tp >>>> 0/ 0 auth >>>> 1/ 5 crypto >>>> 0/ 0 finisher >>>> 1/ 1 reserver >>>> 0/ 0 heartbeatmap >>>> 0/ 0 perfcounter >>>> 0/ 0 rgw >>>> 1/10 civetweb >>>> 1/ 5 javaclient >>>> 0/ 0 asok >>>> 0/ 0 throttle >>>> 0/ 0 refs >>>> 1/ 5 xio >>>> 1/ 5 compressor >>>> 1/ 5 bluestore >>>> 1/ 5 bluefs >>>> 1/ 3 bdev >>>> 1/ 5 kstore >>>> 4/ 5 rocksdb >>>> 4/ 5 leveldb >>>> 4/ 5 memdb >>>> 1/ 5 kinetic >>>> 1/ 5 fuse >>>> 1/ 5 mgr >>>> 1/ 5 mgrc >>>> 1/ 5 dpdk >>>> 1/ 5 eventtrace >>>> -2/-2 (syslog threshold) >>>> -1/-1 (stderr threshold) >>>> max_recent 10000 >>>> max_new 1000 >>>> log_file /var/log/ceph/ceph-osd.59.log >>>> --- end dump of recent events --- >>>> >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>> the following and i hope you might have an idea: >>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>> running luminous? How can they be missing in snapmapper again? >>>>> >>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>> that when you hit the crash we have some context as to what is going on. >>>>> >>>>> Did you track down which part of get_snaps() is returning the error? >>>>> >>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>> luminous when all this happened. >>>>> >>>>> Did this start right after the upgrade? >>>>> >>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>> instead (unless the debug option is enabled, for our regression testing), >>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>> can provide would help! >>>>> >>>>> Thanks- >>>>> sage >>>>> >>>>> >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> >>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>> I'm guessing it is coming from >>>>>>> >>>>>>> object_snaps out; >>>>>>> int r = get_snaps(oid, &out); >>>>>>> if (r < 0) >>>>>>> return r; >>>>>>> >>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>> >>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>> can tolerate it... >>>>>>> >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>> >>>>>>>> HI Sage, >>>>>>>> >>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>> >>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>> >>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>> [0x55c00fdf6642] >>>>>>>> 6: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>> [0x55c00fdfc684] >>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x55c00fd22030] >>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x55c00fb1abc7] >>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x55c0100d5ffd] >>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>> >>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>> { >>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>> set<string> keys; >>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>> if (r < 0) >>>>>>>>>>>> return r; >>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>> if (out) { >>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>> } else { >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>> } >>>>>>>>>>>> return 0; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>> >>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>> can't repair. >>>>>>>>>> >>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>> >>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>> >>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>> 697] >>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>> >>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>> >>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>> + } else { >>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>> + } >>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 22.01.2018 um 20:01 schrieb Sage Weil: > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> thanks! That one fixed it! Now i'm just waiting if the log ever stops >> telling me: >> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >> on pg 3.33e oid >> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >> mapper: , oi: b0d7c...repaired >> >> Currently it seems it does an endless repair and never ends. > > On many objects or the same object? OK no it still does endless repair but the reason is it also repairs "newly" created snapshots. Example: 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error on pg 3.c70 oid 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps missing in mapper, should be: b0d7c...repaired rbd image 'vm-251-disk-1': size 170 GB in 43520 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.52ab946b8b4567 format: 2 features: layering flags: SNAPID NAME SIZE TIMESTAMP 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018 So this is a fresh snapshot which needs a repair? Greets, Stefan > > sage > >> >> Greets, >> Stefan >> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>> it's OK to skip the assert. >>>> >>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>> needs to get killed to proceed. >>> >>> I believe this will fix it: >>> >>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>> >>> Cherry-pick that and let me know? >>> >>> Thanks! >>> sage >>> >>> >>>> >>>> The log output is always: >>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.951820 seconds old, received at >>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>> 00000157:head v 927921'526559757) currently commit_sent >>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.643853 seconds old, received at >>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>> 00000147d:head v 927921'29105383) currently commit_sent >>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.313220 seconds old, received at >>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>> 000000aff:head v 927921'164552063) currently commit_sent >>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>> (Aborted) ** >>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>> 2: (()+0xf890) [0x7f100b974890] >>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x561d87138bc7] >>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x561d873b0db7] >>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x561d876f42bd] >>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>> 15: (()+0x8064) [0x7f100b96d064] >>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> --- logging levels --- >>>> 0/ 5 none >>>> 0/ 0 lockdep >>>> 0/ 0 context >>>> 0/ 0 crush >>>> 0/ 0 mds >>>> 0/ 0 mds_balancer >>>> 0/ 0 mds_locker >>>> 0/ 0 mds_log >>>> 0/ 0 mds_log_expire >>>> 0/ 0 mds_migrator >>>> 0/ 0 buffer >>>> 0/ 0 timer >>>> 0/ 0 filer >>>> 0/ 1 striper >>>> 0/ 0 objecter >>>> 0/ 0 rados >>>> 0/ 0 rbd >>>> 0/ 5 rbd_mirror >>>> 0/ 5 rbd_replay >>>> 0/ 0 journaler >>>> 0/ 0 objectcacher >>>> 0/ 0 client >>>> 0/ 0 osd >>>> 0/ 0 optracker >>>> 0/ 0 objclass >>>> 0/ 0 filestore >>>> 0/ 0 journal >>>> 0/ 0 ms >>>> 0/ 0 mon >>>> 0/ 0 monc >>>> 0/ 0 paxos >>>> 0/ 0 tp >>>> 0/ 0 auth >>>> 1/ 5 crypto >>>> 0/ 0 finisher >>>> 1/ 1 reserver >>>> 0/ 0 heartbeatmap >>>> 0/ 0 perfcounter >>>> 0/ 0 rgw >>>> 1/10 civetweb >>>> 1/ 5 javaclient >>>> 0/ 0 asok >>>> 0/ 0 throttle >>>> 0/ 0 refs >>>> 1/ 5 xio >>>> 1/ 5 compressor >>>> 1/ 5 bluestore >>>> 1/ 5 bluefs >>>> 1/ 3 bdev >>>> 1/ 5 kstore >>>> 4/ 5 rocksdb >>>> 4/ 5 leveldb >>>> 4/ 5 memdb >>>> 1/ 5 kinetic >>>> 1/ 5 fuse >>>> 1/ 5 mgr >>>> 1/ 5 mgrc >>>> 1/ 5 dpdk >>>> 1/ 5 eventtrace >>>> -2/-2 (syslog threshold) >>>> -1/-1 (stderr threshold) >>>> max_recent 10000 >>>> max_new 1000 >>>> log_file /var/log/ceph/ceph-osd.59.log >>>> --- end dump of recent events --- >>>> >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>> the following and i hope you might have an idea: >>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>> running luminous? How can they be missing in snapmapper again? >>>>> >>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>> that when you hit the crash we have some context as to what is going on. >>>>> >>>>> Did you track down which part of get_snaps() is returning the error? >>>>> >>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>> luminous when all this happened. >>>>> >>>>> Did this start right after the upgrade? >>>>> >>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>> instead (unless the debug option is enabled, for our regression testing), >>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>> can provide would help! >>>>> >>>>> Thanks- >>>>> sage >>>>> >>>>> >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> >>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>> I'm guessing it is coming from >>>>>>> >>>>>>> object_snaps out; >>>>>>> int r = get_snaps(oid, &out); >>>>>>> if (r < 0) >>>>>>> return r; >>>>>>> >>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>> >>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>> can tolerate it... >>>>>>> >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>> >>>>>>>> HI Sage, >>>>>>>> >>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>> >>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>> >>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>> [0x55c00fdf6642] >>>>>>>> 6: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>> [0x55c00fdfc684] >>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x55c00fd22030] >>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x55c00fb1abc7] >>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x55c0100d5ffd] >>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>> >>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>> { >>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>> set<string> keys; >>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>> if (r < 0) >>>>>>>>>>>> return r; >>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>> if (out) { >>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>> } else { >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>> } >>>>>>>>>>>> return 0; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>> >>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>> can't repair. >>>>>>>>>> >>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>> >>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>> >>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>> 697] >>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>> >>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>> >>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>> + } else { >>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>> + } >>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote: > > Am 22.01.2018 um 20:01 schrieb Sage Weil: > > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> Hi Sage, > >> > >> thanks! That one fixed it! Now i'm just waiting if the log ever stops > >> telling me: > >> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > >> on pg 3.33e oid > >> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > >> mapper: , oi: b0d7c...repaired > >> > >> Currently it seems it does an endless repair and never ends. > > > > On many objects or the same object? > > OK no it still does endless repair but the reason is it also repairs > "newly" created snapshots. > > Example: > 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error > on pg 3.c70 oid > 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps > missing in mapper, should be: b0d7c...repaired > > rbd image 'vm-251-disk-1': > size 170 GB in 43520 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.52ab946b8b4567 > format: 2 > features: layering > flags: > SNAPID NAME SIZE TIMESTAMP > 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018 > > So this is a fresh snapshot which needs a repair? Aha! That's a helpful clue. Is there any chance you can crank up teh osd debug log on one OSD can catch one of these cases? (We want to capture the clone being created, and also the scrub on the pg that follows and sees the mismatch). Thanks! sage > Greets, > Stefan > > > > > sage > > > >> > >> Greets, > >> Stefan > >> Am 22.01.2018 um 15:30 schrieb Sage Weil: > >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> get_snaps returned the error as there are no snaps in the snapmapper. So > >>>> it's OK to skip the assert. > >>>> > >>>> My biggest problem is now that i'm not able to run a scrub on all PGs to > >>>> fix all those errors because in around 1-3% the osd seem to deadlock and > >>>> needs to get killed to proceed. > >>> > >>> I believe this will fix it: > >>> > >>> https://github.com/liewegas/ceph/commit/wip-snapmapper > >>> > >>> Cherry-pick that and let me know? > >>> > >>> Thanks! > >>> sage > >>> > >>> > >>>> > >>>> The log output is always: > >>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >>>> log [WRN] : slow request 141.951820 seconds old, received at > >>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >>>> 00000157:head v 927921'526559757) currently commit_sent > >>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >>>> log [WRN] : slow request 141.643853 seconds old, received at > >>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >>>> 00000147d:head v 927921'29105383) currently commit_sent > >>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >>>> log [WRN] : slow request 141.313220 seconds old, received at > >>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >>>> 000000aff:head v 927921'164552063) currently commit_sent > >>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >>>> (Aborted) ** > >>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >>>> > >>>> ceph version 12.2.2-94-g92923ef > >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>> 1: (()+0xa4423c) [0x561d876ab23c] > >>>> 2: (()+0xf890) [0x7f100b974890] > >>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > >>>> saction> >&, Context*)+0x254) [0x561d8746ac54] > >>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>> [0x561d87138bc7] > >>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>> const&)+0x57) [0x561d873b0db7] > >>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>> [0x561d876f42bd] > >>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >>>> 15: (()+0x8064) [0x7f100b96d064] > >>>> 16: (clone()+0x6d) [0x7f100aa6162d] > >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>> needed to interpret this. > >>>> > >>>> --- logging levels --- > >>>> 0/ 5 none > >>>> 0/ 0 lockdep > >>>> 0/ 0 context > >>>> 0/ 0 crush > >>>> 0/ 0 mds > >>>> 0/ 0 mds_balancer > >>>> 0/ 0 mds_locker > >>>> 0/ 0 mds_log > >>>> 0/ 0 mds_log_expire > >>>> 0/ 0 mds_migrator > >>>> 0/ 0 buffer > >>>> 0/ 0 timer > >>>> 0/ 0 filer > >>>> 0/ 1 striper > >>>> 0/ 0 objecter > >>>> 0/ 0 rados > >>>> 0/ 0 rbd > >>>> 0/ 5 rbd_mirror > >>>> 0/ 5 rbd_replay > >>>> 0/ 0 journaler > >>>> 0/ 0 objectcacher > >>>> 0/ 0 client > >>>> 0/ 0 osd > >>>> 0/ 0 optracker > >>>> 0/ 0 objclass > >>>> 0/ 0 filestore > >>>> 0/ 0 journal > >>>> 0/ 0 ms > >>>> 0/ 0 mon > >>>> 0/ 0 monc > >>>> 0/ 0 paxos > >>>> 0/ 0 tp > >>>> 0/ 0 auth > >>>> 1/ 5 crypto > >>>> 0/ 0 finisher > >>>> 1/ 1 reserver > >>>> 0/ 0 heartbeatmap > >>>> 0/ 0 perfcounter > >>>> 0/ 0 rgw > >>>> 1/10 civetweb > >>>> 1/ 5 javaclient > >>>> 0/ 0 asok > >>>> 0/ 0 throttle > >>>> 0/ 0 refs > >>>> 1/ 5 xio > >>>> 1/ 5 compressor > >>>> 1/ 5 bluestore > >>>> 1/ 5 bluefs > >>>> 1/ 3 bdev > >>>> 1/ 5 kstore > >>>> 4/ 5 rocksdb > >>>> 4/ 5 leveldb > >>>> 4/ 5 memdb > >>>> 1/ 5 kinetic > >>>> 1/ 5 fuse > >>>> 1/ 5 mgr > >>>> 1/ 5 mgrc > >>>> 1/ 5 dpdk > >>>> 1/ 5 eventtrace > >>>> -2/-2 (syslog threshold) > >>>> -1/-1 (stderr threshold) > >>>> max_recent 10000 > >>>> max_new 1000 > >>>> log_file /var/log/ceph/ceph-osd.59.log > >>>> --- end dump of recent events --- > >>>> > >>>> > >>>> Greets, > >>>> Stefan > >>>> > >>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi Sage, > >>>>>> > >>>>>> thanks for your reply. I'll do this. What i still don't understand is > >>>>>> the following and i hope you might have an idea: > >>>>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>>>> running luminous? How can they be missing in snapmapper again? > >>>>> > >>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>>>> that when you hit the crash we have some context as to what is going on. > >>>>> > >>>>> Did you track down which part of get_snaps() is returning the error? > >>>>> > >>>>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>>>> luminous when all this happened. > >>>>> > >>>>> Did this start right after the upgrade? > >>>>> > >>>>> I started a PR that relaxes some of these assertions so that they clean up > >>>>> instead (unless the debug option is enabled, for our regression testing), > >>>>> but I'm still unclear about what the inconsistency is... any loggin you > >>>>> can provide would help! > >>>>> > >>>>> Thanks- > >>>>> sage > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> Greets, > >>>>>> Stefan > >>>>>> > >>>>>> > >>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>>>> I'm guessing it is coming from > >>>>>>> > >>>>>>> object_snaps out; > >>>>>>> int r = get_snaps(oid, &out); > >>>>>>> if (r < 0) > >>>>>>> return r; > >>>>>>> > >>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>>>> confirm that, and possibly also add debug lines to the various return > >>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>>>> > >>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>>>> can tolerate it... > >>>>>>> > >>>>>>> sage > >>>>>>> > >>>>>>> > >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>> > >>>>>>>> HI Sage, > >>>>>>>> > >>>>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>>>> > >>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>>>> > >>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>>>> > >>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>> [0x55c00fdf6642] > >>>>>>>> 6: > >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>> [0x55c00fdfc684] > >>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>> [0x55c00fd22030] > >>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>> [0x55c00fb1abc7] > >>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>> [0x55c0100d5ffd] > >>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>> needed to interpret this. > >>>>>>>> > >>>>>>>> > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> Hi Sage, > >>>>>>>>>> > >>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>>>> > >>>>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>>>> const hobject_t &oid, > >>>>>>>>>>>> object_snaps *out) > >>>>>>>>>>>> { > >>>>>>>>>>>> assert(check(oid)); > >>>>>>>>>>>> set<string> keys; > >>>>>>>>>>>> map<string, bufferlist> got; > >>>>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>> return r; > >>>>>>>>>>>> if (got.empty()) > >>>>>>>>>>>> return -ENOENT; > >>>>>>>>>>>> if (out) { > >>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>>>> } else { > >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>>>> } > >>>>>>>>>>>> return 0; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> is it save to comment that assert? > >>>>>>>>>>> > >>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>>>> can't repair. > >>>>>>>>>> > >>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>>>> > >>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>>>> > >>>>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>>>> > >>>>>>>>> sage > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>>>>>> 697] > >>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>> [0x5561cdcd7bc7] > >>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>> [0x5561ce292e7d] > >>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>> needed to interpret this. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Greets, > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>>> sage > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>>>> around the current assert: > >>>>>>>>>>>>> > >>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>>>> set<snapid_t> snaps; > >>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>>>> + } else { > >>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>> + } > >>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> s > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, Am 24.01.2018 um 01:07 schrieb Sage Weil: > On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote: >> >> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>> telling me: >>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>> on pg 3.33e oid >>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>> mapper: , oi: b0d7c...repaired >>>> >>>> Currently it seems it does an endless repair and never ends. >>> >>> On many objects or the same object? >> >> OK no it still does endless repair but the reason is it also repairs >> "newly" created snapshots. >> >> Example: >> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error >> on pg 3.c70 oid >> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps >> missing in mapper, should be: b0d7c...repaired >> >> rbd image 'vm-251-disk-1': >> size 170 GB in 43520 objects >> order 22 (4096 kB objects) >> block_name_prefix: rbd_data.52ab946b8b4567 >> format: 2 >> features: layering >> flags: >> SNAPID NAME SIZE TIMESTAMP >> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018 >> >> So this is a fresh snapshot which needs a repair? > > Aha! That's a helpful clue. Is there any chance you can crank up teh > osd debug log on one OSD can catch one of these cases? (We want to > capture the clone being created, and also the scrub on the pg that follows > and sees the mismatch). You mean debug_osd? How high must it be? And do we really need all log messages or might it be enough to change some of them to 0 in the source code - so they always log? Greets, Stefan > Thanks! > sage > > >> Greets, >> Stefan >> >>> >>> sage >>> >>>> >>>> Greets, >>>> Stefan >>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>> it's OK to skip the assert. >>>>>> >>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>> needs to get killed to proceed. >>>>> >>>>> I believe this will fix it: >>>>> >>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>> >>>>> Cherry-pick that and let me know? >>>>> >>>>> Thanks! >>>>> sage >>>>> >>>>> >>>>>> >>>>>> The log output is always: >>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>> (Aborted) ** >>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>> >>>>>> ceph version 12.2.2-94-g92923ef >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>> [0x561d87138bc7] >>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>> const&)+0x57) [0x561d873b0db7] >>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>> [0x561d876f42bd] >>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>> needed to interpret this. >>>>>> >>>>>> --- logging levels --- >>>>>> 0/ 5 none >>>>>> 0/ 0 lockdep >>>>>> 0/ 0 context >>>>>> 0/ 0 crush >>>>>> 0/ 0 mds >>>>>> 0/ 0 mds_balancer >>>>>> 0/ 0 mds_locker >>>>>> 0/ 0 mds_log >>>>>> 0/ 0 mds_log_expire >>>>>> 0/ 0 mds_migrator >>>>>> 0/ 0 buffer >>>>>> 0/ 0 timer >>>>>> 0/ 0 filer >>>>>> 0/ 1 striper >>>>>> 0/ 0 objecter >>>>>> 0/ 0 rados >>>>>> 0/ 0 rbd >>>>>> 0/ 5 rbd_mirror >>>>>> 0/ 5 rbd_replay >>>>>> 0/ 0 journaler >>>>>> 0/ 0 objectcacher >>>>>> 0/ 0 client >>>>>> 0/ 0 osd >>>>>> 0/ 0 optracker >>>>>> 0/ 0 objclass >>>>>> 0/ 0 filestore >>>>>> 0/ 0 journal >>>>>> 0/ 0 ms >>>>>> 0/ 0 mon >>>>>> 0/ 0 monc >>>>>> 0/ 0 paxos >>>>>> 0/ 0 tp >>>>>> 0/ 0 auth >>>>>> 1/ 5 crypto >>>>>> 0/ 0 finisher >>>>>> 1/ 1 reserver >>>>>> 0/ 0 heartbeatmap >>>>>> 0/ 0 perfcounter >>>>>> 0/ 0 rgw >>>>>> 1/10 civetweb >>>>>> 1/ 5 javaclient >>>>>> 0/ 0 asok >>>>>> 0/ 0 throttle >>>>>> 0/ 0 refs >>>>>> 1/ 5 xio >>>>>> 1/ 5 compressor >>>>>> 1/ 5 bluestore >>>>>> 1/ 5 bluefs >>>>>> 1/ 3 bdev >>>>>> 1/ 5 kstore >>>>>> 4/ 5 rocksdb >>>>>> 4/ 5 leveldb >>>>>> 4/ 5 memdb >>>>>> 1/ 5 kinetic >>>>>> 1/ 5 fuse >>>>>> 1/ 5 mgr >>>>>> 1/ 5 mgrc >>>>>> 1/ 5 dpdk >>>>>> 1/ 5 eventtrace >>>>>> -2/-2 (syslog threshold) >>>>>> -1/-1 (stderr threshold) >>>>>> max_recent 10000 >>>>>> max_new 1000 >>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>> --- end dump of recent events --- >>>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>> the following and i hope you might have an idea: >>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>> >>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>> >>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>> >>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>> luminous when all this happened. >>>>>>> >>>>>>> Did this start right after the upgrade? >>>>>>> >>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>> can provide would help! >>>>>>> >>>>>>> Thanks- >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> >>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>> I'm guessing it is coming from >>>>>>>>> >>>>>>>>> object_snaps out; >>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>> if (r < 0) >>>>>>>>> return r; >>>>>>>>> >>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>> >>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>> can tolerate it... >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>> >>>>>>>>>> HI Sage, >>>>>>>>>> >>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>> >>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>> >>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>> 6: >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>> [0x55c00fd22030] >>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi Sage, >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>> >>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>> return r; >>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>> } else { >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>> >>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>> can't repair. >>>>>>>>>>>> >>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>> >>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>> >>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>> 697] >>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Greets, >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 24 Jan 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > Am 24.01.2018 um 01:07 schrieb Sage Weil: > > On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> > >> Am 22.01.2018 um 20:01 schrieb Sage Weil: > >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops > >>>> telling me: > >>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > >>>> on pg 3.33e oid > >>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > >>>> mapper: , oi: b0d7c...repaired > >>>> > >>>> Currently it seems it does an endless repair and never ends. > >>> > >>> On many objects or the same object? > >> > >> OK no it still does endless repair but the reason is it also repairs > >> "newly" created snapshots. > >> > >> Example: > >> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error > >> on pg 3.c70 oid > >> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps > >> missing in mapper, should be: b0d7c...repaired > >> > >> rbd image 'vm-251-disk-1': > >> size 170 GB in 43520 objects > >> order 22 (4096 kB objects) > >> block_name_prefix: rbd_data.52ab946b8b4567 > >> format: 2 > >> features: layering > >> flags: > >> SNAPID NAME SIZE TIMESTAMP > >> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018 > >> > >> So this is a fresh snapshot which needs a repair? > > > > Aha! That's a helpful clue. Is there any chance you can crank up teh > > osd debug log on one OSD can catch one of these cases? (We want to > > capture the clone being created, and also the scrub on the pg that follows > > and sees the mismatch). > > You mean debug_osd? How high must it be? And do we really need all log > messages or might it be enough to change some of them to 0 in the source > code - so they always log? It's just a lot of work to figure out which ones that would be. If debug osd = 20 isn't possible (even on a single OSD), then just the logging statements (ideally all of them) in SnapMapper.cc would be a good start. sage > > Greets, > Stefan > > > Thanks! > > sage > > > > > >> Greets, > >> Stefan > >> > >>> > >>> sage > >>> > >>>> > >>>> Greets, > >>>> Stefan > >>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: > >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi Sage, > >>>>>> > >>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So > >>>>>> it's OK to skip the assert. > >>>>>> > >>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to > >>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and > >>>>>> needs to get killed to proceed. > >>>>> > >>>>> I believe this will fix it: > >>>>> > >>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper > >>>>> > >>>>> Cherry-pick that and let me know? > >>>>> > >>>>> Thanks! > >>>>> sage > >>>>> > >>>>> > >>>>>> > >>>>>> The log output is always: > >>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >>>>>> log [WRN] : slow request 141.951820 seconds old, received at > >>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >>>>>> 00000157:head v 927921'526559757) currently commit_sent > >>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >>>>>> log [WRN] : slow request 141.643853 seconds old, received at > >>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >>>>>> 00000147d:head v 927921'29105383) currently commit_sent > >>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >>>>>> log [WRN] : slow request 141.313220 seconds old, received at > >>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >>>>>> 000000aff:head v 927921'164552063) currently commit_sent > >>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >>>>>> (Aborted) ** > >>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >>>>>> > >>>>>> ceph version 12.2.2-94-g92923ef > >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>> 1: (()+0xa4423c) [0x561d876ab23c] > >>>>>> 2: (()+0xf890) [0x7f100b974890] > >>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > >>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] > >>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>> [0x561d87138bc7] > >>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>> const&)+0x57) [0x561d873b0db7] > >>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>> [0x561d876f42bd] > >>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >>>>>> 15: (()+0x8064) [0x7f100b96d064] > >>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] > >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>> needed to interpret this. > >>>>>> > >>>>>> --- logging levels --- > >>>>>> 0/ 5 none > >>>>>> 0/ 0 lockdep > >>>>>> 0/ 0 context > >>>>>> 0/ 0 crush > >>>>>> 0/ 0 mds > >>>>>> 0/ 0 mds_balancer > >>>>>> 0/ 0 mds_locker > >>>>>> 0/ 0 mds_log > >>>>>> 0/ 0 mds_log_expire > >>>>>> 0/ 0 mds_migrator > >>>>>> 0/ 0 buffer > >>>>>> 0/ 0 timer > >>>>>> 0/ 0 filer > >>>>>> 0/ 1 striper > >>>>>> 0/ 0 objecter > >>>>>> 0/ 0 rados > >>>>>> 0/ 0 rbd > >>>>>> 0/ 5 rbd_mirror > >>>>>> 0/ 5 rbd_replay > >>>>>> 0/ 0 journaler > >>>>>> 0/ 0 objectcacher > >>>>>> 0/ 0 client > >>>>>> 0/ 0 osd > >>>>>> 0/ 0 optracker > >>>>>> 0/ 0 objclass > >>>>>> 0/ 0 filestore > >>>>>> 0/ 0 journal > >>>>>> 0/ 0 ms > >>>>>> 0/ 0 mon > >>>>>> 0/ 0 monc > >>>>>> 0/ 0 paxos > >>>>>> 0/ 0 tp > >>>>>> 0/ 0 auth > >>>>>> 1/ 5 crypto > >>>>>> 0/ 0 finisher > >>>>>> 1/ 1 reserver > >>>>>> 0/ 0 heartbeatmap > >>>>>> 0/ 0 perfcounter > >>>>>> 0/ 0 rgw > >>>>>> 1/10 civetweb > >>>>>> 1/ 5 javaclient > >>>>>> 0/ 0 asok > >>>>>> 0/ 0 throttle > >>>>>> 0/ 0 refs > >>>>>> 1/ 5 xio > >>>>>> 1/ 5 compressor > >>>>>> 1/ 5 bluestore > >>>>>> 1/ 5 bluefs > >>>>>> 1/ 3 bdev > >>>>>> 1/ 5 kstore > >>>>>> 4/ 5 rocksdb > >>>>>> 4/ 5 leveldb > >>>>>> 4/ 5 memdb > >>>>>> 1/ 5 kinetic > >>>>>> 1/ 5 fuse > >>>>>> 1/ 5 mgr > >>>>>> 1/ 5 mgrc > >>>>>> 1/ 5 dpdk > >>>>>> 1/ 5 eventtrace > >>>>>> -2/-2 (syslog threshold) > >>>>>> -1/-1 (stderr threshold) > >>>>>> max_recent 10000 > >>>>>> max_new 1000 > >>>>>> log_file /var/log/ceph/ceph-osd.59.log > >>>>>> --- end dump of recent events --- > >>>>>> > >>>>>> > >>>>>> Greets, > >>>>>> Stefan > >>>>>> > >>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> Hi Sage, > >>>>>>>> > >>>>>>>> thanks for your reply. I'll do this. What i still don't understand is > >>>>>>>> the following and i hope you might have an idea: > >>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>>>>>> running luminous? How can they be missing in snapmapper again? > >>>>>>> > >>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>>>>>> that when you hit the crash we have some context as to what is going on. > >>>>>>> > >>>>>>> Did you track down which part of get_snaps() is returning the error? > >>>>>>> > >>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>>>>>> luminous when all this happened. > >>>>>>> > >>>>>>> Did this start right after the upgrade? > >>>>>>> > >>>>>>> I started a PR that relaxes some of these assertions so that they clean up > >>>>>>> instead (unless the debug option is enabled, for our regression testing), > >>>>>>> but I'm still unclear about what the inconsistency is... any loggin you > >>>>>>> can provide would help! > >>>>>>> > >>>>>>> Thanks- > >>>>>>> sage > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> Greets, > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> > >>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>>>>>> I'm guessing it is coming from > >>>>>>>>> > >>>>>>>>> object_snaps out; > >>>>>>>>> int r = get_snaps(oid, &out); > >>>>>>>>> if (r < 0) > >>>>>>>>> return r; > >>>>>>>>> > >>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>>>>>> confirm that, and possibly also add debug lines to the various return > >>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>>>>>> > >>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>>>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>>>>>> can tolerate it... > >>>>>>>>> > >>>>>>>>> sage > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>> > >>>>>>>>>> HI Sage, > >>>>>>>>>> > >>>>>>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>>>>>> > >>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>>>>>> > >>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>>>>>> > >>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>> [0x55c00fdf6642] > >>>>>>>>>> 6: > >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>> [0x55c00fdfc684] > >>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>> [0x55c00fd22030] > >>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>> [0x55c00fb1abc7] > >>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>> [0x55c0100d5ffd] > >>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>> needed to interpret this. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>> > >>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>>>>>> const hobject_t &oid, > >>>>>>>>>>>>>> object_snaps *out) > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> assert(check(oid)); > >>>>>>>>>>>>>> set<string> keys; > >>>>>>>>>>>>>> map<string, bufferlist> got; > >>>>>>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>>>> return r; > >>>>>>>>>>>>>> if (got.empty()) > >>>>>>>>>>>>>> return -ENOENT; > >>>>>>>>>>>>>> if (out) { > >>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> return 0; > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> is it save to comment that assert? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>>>>>> can't repair. > >>>>>>>>>>>> > >>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>>>>>> > >>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>>>>>> > >>>>>>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>>>>>> > >>>>>>>>>>> sage > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>>>>>>>> 697] > >>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>> [0x5561cdcd7bc7] > >>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>> [0x5561ce292e7d] > >>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Greets, > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> > >>>>>>>>>>>>> sage > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>>>>>> around the current assert: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>>>>>> set<snapid_t> snaps; > >>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> s > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, Am 24.01.2018 um 11:16 schrieb Sage Weil: > On Wed, 24 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> Am 24.01.2018 um 01:07 schrieb Sage Weil: >>> On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>> telling me: >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>> on pg 3.33e oid >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>> mapper: , oi: b0d7c...repaired >>>>>> >>>>>> Currently it seems it does an endless repair and never ends. >>>>> >>>>> On many objects or the same object? >>>> >>>> OK no it still does endless repair but the reason is it also repairs >>>> "newly" created snapshots. >>>> >>>> Example: >>>> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error >>>> on pg 3.c70 oid >>>> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps >>>> missing in mapper, should be: b0d7c...repaired >>>> >>>> rbd image 'vm-251-disk-1': >>>> size 170 GB in 43520 objects >>>> order 22 (4096 kB objects) >>>> block_name_prefix: rbd_data.52ab946b8b4567 >>>> format: 2 >>>> features: layering >>>> flags: >>>> SNAPID NAME SIZE TIMESTAMP >>>> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018 >>>> >>>> So this is a fresh snapshot which needs a repair? >>> >>> Aha! That's a helpful clue. Is there any chance you can crank up teh >>> osd debug log on one OSD can catch one of these cases? (We want to >>> capture the clone being created, and also the scrub on the pg that follows >>> and sees the mismatch). >> >> You mean debug_osd? How high must it be? And do we really need all log >> messages or might it be enough to change some of them to 0 in the source >> code - so they always log? > > It's just a lot of work to figure out which ones that would be. If debug > osd = 20 isn't possible (even on a single OSD), then just the logging > statements (ideally all of them) in SnapMapper.cc would be a good start. sorry for the slow response. I will try to grab a log tonight. Greets, Stefan > sage > > >> >> Greets, >> Stefan >> >>> Thanks! >>> sage >>> >>> >>>> Greets, >>>> Stefan >>>> >>>>> >>>>> sage >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>> it's OK to skip the assert. >>>>>>>> >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>> needs to get killed to proceed. >>>>>>> >>>>>>> I believe this will fix it: >>>>>>> >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>> >>>>>>> Cherry-pick that and let me know? >>>>>>> >>>>>>> Thanks! >>>>>>> sage >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The log output is always: >>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>> (Aborted) ** >>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561d87138bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561d876f42bd] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> --- logging levels --- >>>>>>>> 0/ 5 none >>>>>>>> 0/ 0 lockdep >>>>>>>> 0/ 0 context >>>>>>>> 0/ 0 crush >>>>>>>> 0/ 0 mds >>>>>>>> 0/ 0 mds_balancer >>>>>>>> 0/ 0 mds_locker >>>>>>>> 0/ 0 mds_log >>>>>>>> 0/ 0 mds_log_expire >>>>>>>> 0/ 0 mds_migrator >>>>>>>> 0/ 0 buffer >>>>>>>> 0/ 0 timer >>>>>>>> 0/ 0 filer >>>>>>>> 0/ 1 striper >>>>>>>> 0/ 0 objecter >>>>>>>> 0/ 0 rados >>>>>>>> 0/ 0 rbd >>>>>>>> 0/ 5 rbd_mirror >>>>>>>> 0/ 5 rbd_replay >>>>>>>> 0/ 0 journaler >>>>>>>> 0/ 0 objectcacher >>>>>>>> 0/ 0 client >>>>>>>> 0/ 0 osd >>>>>>>> 0/ 0 optracker >>>>>>>> 0/ 0 objclass >>>>>>>> 0/ 0 filestore >>>>>>>> 0/ 0 journal >>>>>>>> 0/ 0 ms >>>>>>>> 0/ 0 mon >>>>>>>> 0/ 0 monc >>>>>>>> 0/ 0 paxos >>>>>>>> 0/ 0 tp >>>>>>>> 0/ 0 auth >>>>>>>> 1/ 5 crypto >>>>>>>> 0/ 0 finisher >>>>>>>> 1/ 1 reserver >>>>>>>> 0/ 0 heartbeatmap >>>>>>>> 0/ 0 perfcounter >>>>>>>> 0/ 0 rgw >>>>>>>> 1/10 civetweb >>>>>>>> 1/ 5 javaclient >>>>>>>> 0/ 0 asok >>>>>>>> 0/ 0 throttle >>>>>>>> 0/ 0 refs >>>>>>>> 1/ 5 xio >>>>>>>> 1/ 5 compressor >>>>>>>> 1/ 5 bluestore >>>>>>>> 1/ 5 bluefs >>>>>>>> 1/ 3 bdev >>>>>>>> 1/ 5 kstore >>>>>>>> 4/ 5 rocksdb >>>>>>>> 4/ 5 leveldb >>>>>>>> 4/ 5 memdb >>>>>>>> 1/ 5 kinetic >>>>>>>> 1/ 5 fuse >>>>>>>> 1/ 5 mgr >>>>>>>> 1/ 5 mgrc >>>>>>>> 1/ 5 dpdk >>>>>>>> 1/ 5 eventtrace >>>>>>>> -2/-2 (syslog threshold) >>>>>>>> -1/-1 (stderr threshold) >>>>>>>> max_recent 10000 >>>>>>>> max_new 1000 >>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>> --- end dump of recent events --- >>>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>> >>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>> >>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>> >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>> luminous when all this happened. >>>>>>>>> >>>>>>>>> Did this start right after the upgrade? >>>>>>>>> >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>> can provide would help! >>>>>>>>> >>>>>>>>> Thanks- >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>> >>>>>>>>>>> object_snaps out; >>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>> if (r < 0) >>>>>>>>>>> return r; >>>>>>>>>>> >>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>> >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>> can tolerate it... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> >>>>>>>>>>>> HI Sage, >>>>>>>>>>>> >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>> >>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>> >>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>> 6: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>> >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>> >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>> 697] >>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, currently i'm still seeing those repair messages but only one every few hours so i can't reproduce this right now. Running an osd with debug osd = 20 for a whole day is not an option so i'll check whether those message get less and are may be solved by itself after another week. Greets, Stefan Am 24.01.2018 um 11:16 schrieb Sage Weil: > On Wed, 24 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> Am 24.01.2018 um 01:07 schrieb Sage Weil: >>> On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>> telling me: >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>> on pg 3.33e oid >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>> mapper: , oi: b0d7c...repaired >>>>>> >>>>>> Currently it seems it does an endless repair and never ends. >>>>> >>>>> On many objects or the same object? >>>> >>>> OK no it still does endless repair but the reason is it also repairs >>>> "newly" created snapshots. >>>> >>>> Example: >>>> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error >>>> on pg 3.c70 oid >>>> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps >>>> missing in mapper, should be: b0d7c...repaired >>>> >>>> rbd image 'vm-251-disk-1': >>>> size 170 GB in 43520 objects >>>> order 22 (4096 kB objects) >>>> block_name_prefix: rbd_data.52ab946b8b4567 >>>> format: 2 >>>> features: layering >>>> flags: >>>> SNAPID NAME SIZE TIMESTAMP >>>> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018 >>>> >>>> So this is a fresh snapshot which needs a repair? >>> >>> Aha! That's a helpful clue. Is there any chance you can crank up teh >>> osd debug log on one OSD can catch one of these cases? (We want to >>> capture the clone being created, and also the scrub on the pg that follows >>> and sees the mismatch). >> >> You mean debug_osd? How high must it be? And do we really need all log >> messages or might it be enough to change some of them to 0 in the source >> code - so they always log? > > It's just a lot of work to figure out which ones that would be. If debug > osd = 20 isn't possible (even on a single OSD), then just the logging > statements (ideally all of them) in SnapMapper.cc would be a good start. > > sage > > >> >> Greets, >> Stefan >> >>> Thanks! >>> sage >>> >>> >>>> Greets, >>>> Stefan >>>> >>>>> >>>>> sage >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>> it's OK to skip the assert. >>>>>>>> >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>> needs to get killed to proceed. >>>>>>> >>>>>>> I believe this will fix it: >>>>>>> >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>> >>>>>>> Cherry-pick that and let me know? >>>>>>> >>>>>>> Thanks! >>>>>>> sage >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The log output is always: >>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>> (Aborted) ** >>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561d87138bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561d876f42bd] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> --- logging levels --- >>>>>>>> 0/ 5 none >>>>>>>> 0/ 0 lockdep >>>>>>>> 0/ 0 context >>>>>>>> 0/ 0 crush >>>>>>>> 0/ 0 mds >>>>>>>> 0/ 0 mds_balancer >>>>>>>> 0/ 0 mds_locker >>>>>>>> 0/ 0 mds_log >>>>>>>> 0/ 0 mds_log_expire >>>>>>>> 0/ 0 mds_migrator >>>>>>>> 0/ 0 buffer >>>>>>>> 0/ 0 timer >>>>>>>> 0/ 0 filer >>>>>>>> 0/ 1 striper >>>>>>>> 0/ 0 objecter >>>>>>>> 0/ 0 rados >>>>>>>> 0/ 0 rbd >>>>>>>> 0/ 5 rbd_mirror >>>>>>>> 0/ 5 rbd_replay >>>>>>>> 0/ 0 journaler >>>>>>>> 0/ 0 objectcacher >>>>>>>> 0/ 0 client >>>>>>>> 0/ 0 osd >>>>>>>> 0/ 0 optracker >>>>>>>> 0/ 0 objclass >>>>>>>> 0/ 0 filestore >>>>>>>> 0/ 0 journal >>>>>>>> 0/ 0 ms >>>>>>>> 0/ 0 mon >>>>>>>> 0/ 0 monc >>>>>>>> 0/ 0 paxos >>>>>>>> 0/ 0 tp >>>>>>>> 0/ 0 auth >>>>>>>> 1/ 5 crypto >>>>>>>> 0/ 0 finisher >>>>>>>> 1/ 1 reserver >>>>>>>> 0/ 0 heartbeatmap >>>>>>>> 0/ 0 perfcounter >>>>>>>> 0/ 0 rgw >>>>>>>> 1/10 civetweb >>>>>>>> 1/ 5 javaclient >>>>>>>> 0/ 0 asok >>>>>>>> 0/ 0 throttle >>>>>>>> 0/ 0 refs >>>>>>>> 1/ 5 xio >>>>>>>> 1/ 5 compressor >>>>>>>> 1/ 5 bluestore >>>>>>>> 1/ 5 bluefs >>>>>>>> 1/ 3 bdev >>>>>>>> 1/ 5 kstore >>>>>>>> 4/ 5 rocksdb >>>>>>>> 4/ 5 leveldb >>>>>>>> 4/ 5 memdb >>>>>>>> 1/ 5 kinetic >>>>>>>> 1/ 5 fuse >>>>>>>> 1/ 5 mgr >>>>>>>> 1/ 5 mgrc >>>>>>>> 1/ 5 dpdk >>>>>>>> 1/ 5 eventtrace >>>>>>>> -2/-2 (syslog threshold) >>>>>>>> -1/-1 (stderr threshold) >>>>>>>> max_recent 10000 >>>>>>>> max_new 1000 >>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>> --- end dump of recent events --- >>>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>> >>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>> >>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>> >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>> luminous when all this happened. >>>>>>>>> >>>>>>>>> Did this start right after the upgrade? >>>>>>>>> >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>> can provide would help! >>>>>>>>> >>>>>>>>> Thanks- >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>> >>>>>>>>>>> object_snaps out; >>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>> if (r < 0) >>>>>>>>>>> return r; >>>>>>>>>>> >>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>> >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>> can tolerate it... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> >>>>>>>>>>>> HI Sage, >>>>>>>>>>>> >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>> >>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>> >>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>> 6: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>> >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>> >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>> 697] >>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, some days ago i reverted to stock luminous with https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a + https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa on top. I scrubbed and deep-scrubbed all pgs before and no errors were found. this worked fine until the ceph balancer runned and started remapping some pgs. This again trigged this assert: /build/ceph/src/osd/SnapMapper.cc: In function 'int SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' thread 7f4289bff700 time 2018-02-02 17:46:33.249579 /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) Any idea how to fix it? Greets, Stefan Am 22.01.2018 um 20:01 schrieb Sage Weil: > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> thanks! That one fixed it! Now i'm just waiting if the log ever stops >> telling me: >> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >> on pg 3.33e oid >> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >> mapper: , oi: b0d7c...repaired >> >> Currently it seems it does an endless repair and never ends. > > On many objects or the same object? > > sage > >> >> Greets, >> Stefan >> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>> it's OK to skip the assert. >>>> >>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>> needs to get killed to proceed. >>> >>> I believe this will fix it: >>> >>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>> >>> Cherry-pick that and let me know? >>> >>> Thanks! >>> sage >>> >>> >>>> >>>> The log output is always: >>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.951820 seconds old, received at >>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>> 00000157:head v 927921'526559757) currently commit_sent >>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.643853 seconds old, received at >>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>> 00000147d:head v 927921'29105383) currently commit_sent >>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>> log [WRN] : slow request 141.313220 seconds old, received at >>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>> 000000aff:head v 927921'164552063) currently commit_sent >>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>> (Aborted) ** >>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>> >>>> ceph version 12.2.2-94-g92923ef >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>> 2: (()+0xf890) [0x7f100b974890] >>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>> [0x561d87138bc7] >>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>> const&)+0x57) [0x561d873b0db7] >>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>> [0x561d876f42bd] >>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>> 15: (()+0x8064) [0x7f100b96d064] >>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> --- logging levels --- >>>> 0/ 5 none >>>> 0/ 0 lockdep >>>> 0/ 0 context >>>> 0/ 0 crush >>>> 0/ 0 mds >>>> 0/ 0 mds_balancer >>>> 0/ 0 mds_locker >>>> 0/ 0 mds_log >>>> 0/ 0 mds_log_expire >>>> 0/ 0 mds_migrator >>>> 0/ 0 buffer >>>> 0/ 0 timer >>>> 0/ 0 filer >>>> 0/ 1 striper >>>> 0/ 0 objecter >>>> 0/ 0 rados >>>> 0/ 0 rbd >>>> 0/ 5 rbd_mirror >>>> 0/ 5 rbd_replay >>>> 0/ 0 journaler >>>> 0/ 0 objectcacher >>>> 0/ 0 client >>>> 0/ 0 osd >>>> 0/ 0 optracker >>>> 0/ 0 objclass >>>> 0/ 0 filestore >>>> 0/ 0 journal >>>> 0/ 0 ms >>>> 0/ 0 mon >>>> 0/ 0 monc >>>> 0/ 0 paxos >>>> 0/ 0 tp >>>> 0/ 0 auth >>>> 1/ 5 crypto >>>> 0/ 0 finisher >>>> 1/ 1 reserver >>>> 0/ 0 heartbeatmap >>>> 0/ 0 perfcounter >>>> 0/ 0 rgw >>>> 1/10 civetweb >>>> 1/ 5 javaclient >>>> 0/ 0 asok >>>> 0/ 0 throttle >>>> 0/ 0 refs >>>> 1/ 5 xio >>>> 1/ 5 compressor >>>> 1/ 5 bluestore >>>> 1/ 5 bluefs >>>> 1/ 3 bdev >>>> 1/ 5 kstore >>>> 4/ 5 rocksdb >>>> 4/ 5 leveldb >>>> 4/ 5 memdb >>>> 1/ 5 kinetic >>>> 1/ 5 fuse >>>> 1/ 5 mgr >>>> 1/ 5 mgrc >>>> 1/ 5 dpdk >>>> 1/ 5 eventtrace >>>> -2/-2 (syslog threshold) >>>> -1/-1 (stderr threshold) >>>> max_recent 10000 >>>> max_new 1000 >>>> log_file /var/log/ceph/ceph-osd.59.log >>>> --- end dump of recent events --- >>>> >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>> the following and i hope you might have an idea: >>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>> running luminous? How can they be missing in snapmapper again? >>>>> >>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>> that when you hit the crash we have some context as to what is going on. >>>>> >>>>> Did you track down which part of get_snaps() is returning the error? >>>>> >>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>> luminous when all this happened. >>>>> >>>>> Did this start right after the upgrade? >>>>> >>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>> instead (unless the debug option is enabled, for our regression testing), >>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>> can provide would help! >>>>> >>>>> Thanks- >>>>> sage >>>>> >>>>> >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> >>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>> I'm guessing it is coming from >>>>>>> >>>>>>> object_snaps out; >>>>>>> int r = get_snaps(oid, &out); >>>>>>> if (r < 0) >>>>>>> return r; >>>>>>> >>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>> >>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>> can tolerate it... >>>>>>> >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>> >>>>>>>> HI Sage, >>>>>>>> >>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>> >>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>> >>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>> [0x55c00fdf6642] >>>>>>>> 6: >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>> [0x55c00fdfc684] >>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>> [0x55c00fd22030] >>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x55c00fb1abc7] >>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x55c0100d5ffd] >>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>> >>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>> { >>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>> set<string> keys; >>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>> if (r < 0) >>>>>>>>>>>> return r; >>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>> if (out) { >>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>> } else { >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>> } >>>>>>>>>>>> return 0; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>> >>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>> can't repair. >>>>>>>>>> >>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>> >>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>> >>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>> 697] >>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>> >>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>> >>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>> + } else { >>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>> + } >>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > some days ago i reverted to stock luminous with > https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a > + > https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa > on top. I scrubbed and deep-scrubbed all pgs before and no errors were > found. > > this worked fine until the ceph balancer runned and started remapping > some pgs. > > This again trigged this assert: > > /build/ceph/src/osd/SnapMapper.cc: In function 'int > SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' > thread 7f4289bff700 time 2018-02-02 17:46:33.249579 > /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > > Any idea how to fix it? Cheap workaround is to comment out that assert. I'm still tryign to sort out how this happened, though. Not sure if you shared this already, but: - bluestore or filestore? - no cache tiering, right? nowhere on the cluster? It looks like the SnapMapper entry with an empty snaps vector came in via add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() doesn't have that check). There are a few place it could come from, including the on_local_recover() call and the update_snap_map() call. That last one is interesting because it pulls the snaps for a new clone out of the pg log entry.. maybe those aren't getting populated properly. Can you confirm that all of the OSDs on the cluster are running the latest? (ceph versions). Thanks! sage > > Greets, > Stefan > > Am 22.01.2018 um 20:01 schrieb Sage Weil: > > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> Hi Sage, > >> > >> thanks! That one fixed it! Now i'm just waiting if the log ever stops > >> telling me: > >> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > >> on pg 3.33e oid > >> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > >> mapper: , oi: b0d7c...repaired > >> > >> Currently it seems it does an endless repair and never ends. > > > > On many objects or the same object? > > > > sage > > > >> > >> Greets, > >> Stefan > >> Am 22.01.2018 um 15:30 schrieb Sage Weil: > >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> get_snaps returned the error as there are no snaps in the snapmapper. So > >>>> it's OK to skip the assert. > >>>> > >>>> My biggest problem is now that i'm not able to run a scrub on all PGs to > >>>> fix all those errors because in around 1-3% the osd seem to deadlock and > >>>> needs to get killed to proceed. > >>> > >>> I believe this will fix it: > >>> > >>> https://github.com/liewegas/ceph/commit/wip-snapmapper > >>> > >>> Cherry-pick that and let me know? > >>> > >>> Thanks! > >>> sage > >>> > >>> > >>>> > >>>> The log output is always: > >>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >>>> log [WRN] : slow request 141.951820 seconds old, received at > >>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >>>> 00000157:head v 927921'526559757) currently commit_sent > >>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >>>> log [WRN] : slow request 141.643853 seconds old, received at > >>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >>>> 00000147d:head v 927921'29105383) currently commit_sent > >>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >>>> log [WRN] : slow request 141.313220 seconds old, received at > >>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >>>> 000000aff:head v 927921'164552063) currently commit_sent > >>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >>>> (Aborted) ** > >>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >>>> > >>>> ceph version 12.2.2-94-g92923ef > >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>> 1: (()+0xa4423c) [0x561d876ab23c] > >>>> 2: (()+0xf890) [0x7f100b974890] > >>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > >>>> saction> >&, Context*)+0x254) [0x561d8746ac54] > >>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>> [0x561d87138bc7] > >>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>> const&)+0x57) [0x561d873b0db7] > >>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>> [0x561d876f42bd] > >>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >>>> 15: (()+0x8064) [0x7f100b96d064] > >>>> 16: (clone()+0x6d) [0x7f100aa6162d] > >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>> needed to interpret this. > >>>> > >>>> --- logging levels --- > >>>> 0/ 5 none > >>>> 0/ 0 lockdep > >>>> 0/ 0 context > >>>> 0/ 0 crush > >>>> 0/ 0 mds > >>>> 0/ 0 mds_balancer > >>>> 0/ 0 mds_locker > >>>> 0/ 0 mds_log > >>>> 0/ 0 mds_log_expire > >>>> 0/ 0 mds_migrator > >>>> 0/ 0 buffer > >>>> 0/ 0 timer > >>>> 0/ 0 filer > >>>> 0/ 1 striper > >>>> 0/ 0 objecter > >>>> 0/ 0 rados > >>>> 0/ 0 rbd > >>>> 0/ 5 rbd_mirror > >>>> 0/ 5 rbd_replay > >>>> 0/ 0 journaler > >>>> 0/ 0 objectcacher > >>>> 0/ 0 client > >>>> 0/ 0 osd > >>>> 0/ 0 optracker > >>>> 0/ 0 objclass > >>>> 0/ 0 filestore > >>>> 0/ 0 journal > >>>> 0/ 0 ms > >>>> 0/ 0 mon > >>>> 0/ 0 monc > >>>> 0/ 0 paxos > >>>> 0/ 0 tp > >>>> 0/ 0 auth > >>>> 1/ 5 crypto > >>>> 0/ 0 finisher > >>>> 1/ 1 reserver > >>>> 0/ 0 heartbeatmap > >>>> 0/ 0 perfcounter > >>>> 0/ 0 rgw > >>>> 1/10 civetweb > >>>> 1/ 5 javaclient > >>>> 0/ 0 asok > >>>> 0/ 0 throttle > >>>> 0/ 0 refs > >>>> 1/ 5 xio > >>>> 1/ 5 compressor > >>>> 1/ 5 bluestore > >>>> 1/ 5 bluefs > >>>> 1/ 3 bdev > >>>> 1/ 5 kstore > >>>> 4/ 5 rocksdb > >>>> 4/ 5 leveldb > >>>> 4/ 5 memdb > >>>> 1/ 5 kinetic > >>>> 1/ 5 fuse > >>>> 1/ 5 mgr > >>>> 1/ 5 mgrc > >>>> 1/ 5 dpdk > >>>> 1/ 5 eventtrace > >>>> -2/-2 (syslog threshold) > >>>> -1/-1 (stderr threshold) > >>>> max_recent 10000 > >>>> max_new 1000 > >>>> log_file /var/log/ceph/ceph-osd.59.log > >>>> --- end dump of recent events --- > >>>> > >>>> > >>>> Greets, > >>>> Stefan > >>>> > >>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi Sage, > >>>>>> > >>>>>> thanks for your reply. I'll do this. What i still don't understand is > >>>>>> the following and i hope you might have an idea: > >>>>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>>>> running luminous? How can they be missing in snapmapper again? > >>>>> > >>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>>>> that when you hit the crash we have some context as to what is going on. > >>>>> > >>>>> Did you track down which part of get_snaps() is returning the error? > >>>>> > >>>>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>>>> luminous when all this happened. > >>>>> > >>>>> Did this start right after the upgrade? > >>>>> > >>>>> I started a PR that relaxes some of these assertions so that they clean up > >>>>> instead (unless the debug option is enabled, for our regression testing), > >>>>> but I'm still unclear about what the inconsistency is... any loggin you > >>>>> can provide would help! > >>>>> > >>>>> Thanks- > >>>>> sage > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> Greets, > >>>>>> Stefan > >>>>>> > >>>>>> > >>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>>>> I'm guessing it is coming from > >>>>>>> > >>>>>>> object_snaps out; > >>>>>>> int r = get_snaps(oid, &out); > >>>>>>> if (r < 0) > >>>>>>> return r; > >>>>>>> > >>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>>>> confirm that, and possibly also add debug lines to the various return > >>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>>>> > >>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>>>> can tolerate it... > >>>>>>> > >>>>>>> sage > >>>>>>> > >>>>>>> > >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>> > >>>>>>>> HI Sage, > >>>>>>>> > >>>>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>>>> > >>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>>>> > >>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>>>> > >>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>> [0x55c00fdf6642] > >>>>>>>> 6: > >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>> [0x55c00fdfc684] > >>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>> [0x55c00fd22030] > >>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>> [0x55c00fb1abc7] > >>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>> [0x55c0100d5ffd] > >>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>> needed to interpret this. > >>>>>>>> > >>>>>>>> > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> Hi Sage, > >>>>>>>>>> > >>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>>>> > >>>>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>>>> const hobject_t &oid, > >>>>>>>>>>>> object_snaps *out) > >>>>>>>>>>>> { > >>>>>>>>>>>> assert(check(oid)); > >>>>>>>>>>>> set<string> keys; > >>>>>>>>>>>> map<string, bufferlist> got; > >>>>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>> return r; > >>>>>>>>>>>> if (got.empty()) > >>>>>>>>>>>> return -ENOENT; > >>>>>>>>>>>> if (out) { > >>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>>>> } else { > >>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>>>> } > >>>>>>>>>>>> return 0; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> is it save to comment that assert? > >>>>>>>>>>> > >>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>>>> can't repair. > >>>>>>>>>> > >>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>>>> > >>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>>>> > >>>>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>>>> > >>>>>>>>> sage > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>>>>>> 697] > >>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>> [0x5561cdcd7bc7] > >>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>> [0x5561ce292e7d] > >>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>> needed to interpret this. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Greets, > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>>> sage > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>>>> around the current assert: > >>>>>>>>>>>>> > >>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>>>> set<snapid_t> snaps; > >>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>>>> + } else { > >>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>> + } > >>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> s > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 02.02.2018 um 20:28 schrieb Sage Weil: > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> some days ago i reverted to stock luminous with >> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >> + >> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >> found. >> >> this worked fine until the ceph balancer runned and started remapping >> some pgs. >> >> This again trigged this assert: >> >> /build/ceph/src/osd/SnapMapper.cc: In function 'int >> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >> >> Any idea how to fix it? > > Cheap workaround is to comment out that assert. I'm still tryign to sort Yes i did but i feel unconforable with it as the snapmapper value is still wrong. > out how this happened, though. Not sure if you shared this already, but: > - bluestore or filestore? As i first reported it only filestore. Now also some bluestore OSDs. > - no cache tiering, right? nowhere on the cluster? Yes and Yes. Never used cache tiering at all. > It looks like the SnapMapper entry with an empty snaps vector came in via > add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() > doesn't have that check). There are a few place it could come from, > including the on_local_recover() call and the update_snap_map() call. > That last one is interesting because it pulls the snaps for a new clone > out of the pg log entry.. maybe those aren't getting populated properly. Is it also possible that they got corrupted in the past and now ceph is unable to detect it and repair it with a scrub or deep scrub? > Can you confirm that all of the OSDs on the cluster are running the > latest? (ceph versions). Yes they do - i always check this after each update and i rechecked it again. Stefan >> >> Greets, >> Stefan >> >> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>> telling me: >>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>> on pg 3.33e oid >>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>> mapper: , oi: b0d7c...repaired >>>> >>>> Currently it seems it does an endless repair and never ends. >>> >>> On many objects or the same object? >>> >>> sage >>> >>>> >>>> Greets, >>>> Stefan >>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>> it's OK to skip the assert. >>>>>> >>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>> needs to get killed to proceed. >>>>> >>>>> I believe this will fix it: >>>>> >>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>> >>>>> Cherry-pick that and let me know? >>>>> >>>>> Thanks! >>>>> sage >>>>> >>>>> >>>>>> >>>>>> The log output is always: >>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>> (Aborted) ** >>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>> >>>>>> ceph version 12.2.2-94-g92923ef >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>> [0x561d87138bc7] >>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>> const&)+0x57) [0x561d873b0db7] >>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>> [0x561d876f42bd] >>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>> needed to interpret this. >>>>>> >>>>>> --- logging levels --- >>>>>> 0/ 5 none >>>>>> 0/ 0 lockdep >>>>>> 0/ 0 context >>>>>> 0/ 0 crush >>>>>> 0/ 0 mds >>>>>> 0/ 0 mds_balancer >>>>>> 0/ 0 mds_locker >>>>>> 0/ 0 mds_log >>>>>> 0/ 0 mds_log_expire >>>>>> 0/ 0 mds_migrator >>>>>> 0/ 0 buffer >>>>>> 0/ 0 timer >>>>>> 0/ 0 filer >>>>>> 0/ 1 striper >>>>>> 0/ 0 objecter >>>>>> 0/ 0 rados >>>>>> 0/ 0 rbd >>>>>> 0/ 5 rbd_mirror >>>>>> 0/ 5 rbd_replay >>>>>> 0/ 0 journaler >>>>>> 0/ 0 objectcacher >>>>>> 0/ 0 client >>>>>> 0/ 0 osd >>>>>> 0/ 0 optracker >>>>>> 0/ 0 objclass >>>>>> 0/ 0 filestore >>>>>> 0/ 0 journal >>>>>> 0/ 0 ms >>>>>> 0/ 0 mon >>>>>> 0/ 0 monc >>>>>> 0/ 0 paxos >>>>>> 0/ 0 tp >>>>>> 0/ 0 auth >>>>>> 1/ 5 crypto >>>>>> 0/ 0 finisher >>>>>> 1/ 1 reserver >>>>>> 0/ 0 heartbeatmap >>>>>> 0/ 0 perfcounter >>>>>> 0/ 0 rgw >>>>>> 1/10 civetweb >>>>>> 1/ 5 javaclient >>>>>> 0/ 0 asok >>>>>> 0/ 0 throttle >>>>>> 0/ 0 refs >>>>>> 1/ 5 xio >>>>>> 1/ 5 compressor >>>>>> 1/ 5 bluestore >>>>>> 1/ 5 bluefs >>>>>> 1/ 3 bdev >>>>>> 1/ 5 kstore >>>>>> 4/ 5 rocksdb >>>>>> 4/ 5 leveldb >>>>>> 4/ 5 memdb >>>>>> 1/ 5 kinetic >>>>>> 1/ 5 fuse >>>>>> 1/ 5 mgr >>>>>> 1/ 5 mgrc >>>>>> 1/ 5 dpdk >>>>>> 1/ 5 eventtrace >>>>>> -2/-2 (syslog threshold) >>>>>> -1/-1 (stderr threshold) >>>>>> max_recent 10000 >>>>>> max_new 1000 >>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>> --- end dump of recent events --- >>>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>> the following and i hope you might have an idea: >>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>> >>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>> >>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>> >>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>> luminous when all this happened. >>>>>>> >>>>>>> Did this start right after the upgrade? >>>>>>> >>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>> can provide would help! >>>>>>> >>>>>>> Thanks- >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> >>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>> I'm guessing it is coming from >>>>>>>>> >>>>>>>>> object_snaps out; >>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>> if (r < 0) >>>>>>>>> return r; >>>>>>>>> >>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>> >>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>> can tolerate it... >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>> >>>>>>>>>> HI Sage, >>>>>>>>>> >>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>> >>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>> >>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>> 6: >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>> [0x55c00fd22030] >>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi Sage, >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>> >>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>> return r; >>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>> } else { >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>> >>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>> can't repair. >>>>>>>>>>>> >>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>> >>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>> >>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>> 697] >>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Greets, >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > Am 02.02.2018 um 20:28 schrieb Sage Weil: > > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > >> Hi Sage, > >> > >> some days ago i reverted to stock luminous with > >> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a > >> + > >> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa > >> on top. I scrubbed and deep-scrubbed all pgs before and no errors were > >> found. > >> > >> this worked fine until the ceph balancer runned and started remapping > >> some pgs. > >> > >> This again trigged this assert: > >> > >> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' > >> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 > >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >> > >> Any idea how to fix it? > > > > Cheap workaround is to comment out that assert. I'm still tryign to sort > > Yes i did but i feel unconforable with it as the snapmapper value is > still wrong. > > > out how this happened, though. Not sure if you shared this already, but: > > - bluestore or filestore? > > As i first reported it only filestore. Now also some bluestore OSDs. > > > - no cache tiering, right? nowhere on the cluster? > Yes and Yes. Never used cache tiering at all. > > > It looks like the SnapMapper entry with an empty snaps vector came in via > > add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() > > doesn't have that check). There are a few place it could come from, > > including the on_local_recover() call and the update_snap_map() call. > > That last one is interesting because it pulls the snaps for a new clone > > out of the pg log entry.. maybe those aren't getting populated properly. > > Is it also possible that they got corrupted in the past and now ceph is > unable to detect it and repair it with a scrub or deep scrub? > > > Can you confirm that all of the OSDs on the cluster are running the > > latest? (ceph versions). > > Yes they do - i always check this after each update and i rechecked it > again. Ok thanks! I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that has those commits above plus another one that adds debug messages in the various places that the empty snap vectors might be coming from. This won't help it get past the rebalance issue you hit above, but hopefully it will trigger when you create a new clone so we can see how this is happening. If we need to, I think we can ignore the assert (by commenting it out), but one downside there is that we'll see my warnings pop up when recovering those objects so we won't be able to tell where they really came from. Since you were seeing this on new snaps/clones, though, I'm hoping we won't end up in that situation! Thanks! sage > > Stefan > > > >> > >> Greets, > >> Stefan > >> > >> Am 22.01.2018 um 20:01 schrieb Sage Weil: > >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops > >>>> telling me: > >>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > >>>> on pg 3.33e oid > >>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > >>>> mapper: , oi: b0d7c...repaired > >>>> > >>>> Currently it seems it does an endless repair and never ends. > >>> > >>> On many objects or the same object? > >>> > >>> sage > >>> > >>>> > >>>> Greets, > >>>> Stefan > >>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: > >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi Sage, > >>>>>> > >>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So > >>>>>> it's OK to skip the assert. > >>>>>> > >>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to > >>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and > >>>>>> needs to get killed to proceed. > >>>>> > >>>>> I believe this will fix it: > >>>>> > >>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper > >>>>> > >>>>> Cherry-pick that and let me know? > >>>>> > >>>>> Thanks! > >>>>> sage > >>>>> > >>>>> > >>>>>> > >>>>>> The log output is always: > >>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >>>>>> log [WRN] : slow request 141.951820 seconds old, received at > >>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >>>>>> 00000157:head v 927921'526559757) currently commit_sent > >>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >>>>>> log [WRN] : slow request 141.643853 seconds old, received at > >>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >>>>>> 00000147d:head v 927921'29105383) currently commit_sent > >>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >>>>>> log [WRN] : slow request 141.313220 seconds old, received at > >>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >>>>>> 000000aff:head v 927921'164552063) currently commit_sent > >>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >>>>>> (Aborted) ** > >>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >>>>>> > >>>>>> ceph version 12.2.2-94-g92923ef > >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>> 1: (()+0xa4423c) [0x561d876ab23c] > >>>>>> 2: (()+0xf890) [0x7f100b974890] > >>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > >>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] > >>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>> [0x561d87138bc7] > >>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>> const&)+0x57) [0x561d873b0db7] > >>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>> [0x561d876f42bd] > >>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >>>>>> 15: (()+0x8064) [0x7f100b96d064] > >>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] > >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>> needed to interpret this. > >>>>>> > >>>>>> --- logging levels --- > >>>>>> 0/ 5 none > >>>>>> 0/ 0 lockdep > >>>>>> 0/ 0 context > >>>>>> 0/ 0 crush > >>>>>> 0/ 0 mds > >>>>>> 0/ 0 mds_balancer > >>>>>> 0/ 0 mds_locker > >>>>>> 0/ 0 mds_log > >>>>>> 0/ 0 mds_log_expire > >>>>>> 0/ 0 mds_migrator > >>>>>> 0/ 0 buffer > >>>>>> 0/ 0 timer > >>>>>> 0/ 0 filer > >>>>>> 0/ 1 striper > >>>>>> 0/ 0 objecter > >>>>>> 0/ 0 rados > >>>>>> 0/ 0 rbd > >>>>>> 0/ 5 rbd_mirror > >>>>>> 0/ 5 rbd_replay > >>>>>> 0/ 0 journaler > >>>>>> 0/ 0 objectcacher > >>>>>> 0/ 0 client > >>>>>> 0/ 0 osd > >>>>>> 0/ 0 optracker > >>>>>> 0/ 0 objclass > >>>>>> 0/ 0 filestore > >>>>>> 0/ 0 journal > >>>>>> 0/ 0 ms > >>>>>> 0/ 0 mon > >>>>>> 0/ 0 monc > >>>>>> 0/ 0 paxos > >>>>>> 0/ 0 tp > >>>>>> 0/ 0 auth > >>>>>> 1/ 5 crypto > >>>>>> 0/ 0 finisher > >>>>>> 1/ 1 reserver > >>>>>> 0/ 0 heartbeatmap > >>>>>> 0/ 0 perfcounter > >>>>>> 0/ 0 rgw > >>>>>> 1/10 civetweb > >>>>>> 1/ 5 javaclient > >>>>>> 0/ 0 asok > >>>>>> 0/ 0 throttle > >>>>>> 0/ 0 refs > >>>>>> 1/ 5 xio > >>>>>> 1/ 5 compressor > >>>>>> 1/ 5 bluestore > >>>>>> 1/ 5 bluefs > >>>>>> 1/ 3 bdev > >>>>>> 1/ 5 kstore > >>>>>> 4/ 5 rocksdb > >>>>>> 4/ 5 leveldb > >>>>>> 4/ 5 memdb > >>>>>> 1/ 5 kinetic > >>>>>> 1/ 5 fuse > >>>>>> 1/ 5 mgr > >>>>>> 1/ 5 mgrc > >>>>>> 1/ 5 dpdk > >>>>>> 1/ 5 eventtrace > >>>>>> -2/-2 (syslog threshold) > >>>>>> -1/-1 (stderr threshold) > >>>>>> max_recent 10000 > >>>>>> max_new 1000 > >>>>>> log_file /var/log/ceph/ceph-osd.59.log > >>>>>> --- end dump of recent events --- > >>>>>> > >>>>>> > >>>>>> Greets, > >>>>>> Stefan > >>>>>> > >>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> Hi Sage, > >>>>>>>> > >>>>>>>> thanks for your reply. I'll do this. What i still don't understand is > >>>>>>>> the following and i hope you might have an idea: > >>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>>>>>> running luminous? How can they be missing in snapmapper again? > >>>>>>> > >>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>>>>>> that when you hit the crash we have some context as to what is going on. > >>>>>>> > >>>>>>> Did you track down which part of get_snaps() is returning the error? > >>>>>>> > >>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>>>>>> luminous when all this happened. > >>>>>>> > >>>>>>> Did this start right after the upgrade? > >>>>>>> > >>>>>>> I started a PR that relaxes some of these assertions so that they clean up > >>>>>>> instead (unless the debug option is enabled, for our regression testing), > >>>>>>> but I'm still unclear about what the inconsistency is... any loggin you > >>>>>>> can provide would help! > >>>>>>> > >>>>>>> Thanks- > >>>>>>> sage > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> Greets, > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> > >>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>>>>>> I'm guessing it is coming from > >>>>>>>>> > >>>>>>>>> object_snaps out; > >>>>>>>>> int r = get_snaps(oid, &out); > >>>>>>>>> if (r < 0) > >>>>>>>>> return r; > >>>>>>>>> > >>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>>>>>> confirm that, and possibly also add debug lines to the various return > >>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>>>>>> > >>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>>>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>>>>>> can tolerate it... > >>>>>>>>> > >>>>>>>>> sage > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>> > >>>>>>>>>> HI Sage, > >>>>>>>>>> > >>>>>>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>>>>>> > >>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>>>>>> > >>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>>>>>> > >>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>> [0x55c00fdf6642] > >>>>>>>>>> 6: > >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>> [0x55c00fdfc684] > >>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>> [0x55c00fd22030] > >>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>> [0x55c00fb1abc7] > >>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>> [0x55c0100d5ffd] > >>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>> needed to interpret this. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>> > >>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>>>>>> const hobject_t &oid, > >>>>>>>>>>>>>> object_snaps *out) > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> assert(check(oid)); > >>>>>>>>>>>>>> set<string> keys; > >>>>>>>>>>>>>> map<string, bufferlist> got; > >>>>>>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>>>> return r; > >>>>>>>>>>>>>> if (got.empty()) > >>>>>>>>>>>>>> return -ENOENT; > >>>>>>>>>>>>>> if (out) { > >>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> return 0; > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> is it save to comment that assert? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>>>>>> can't repair. > >>>>>>>>>>>> > >>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>>>>>> > >>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>>>>>> > >>>>>>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>>>>>> > >>>>>>>>>>> sage > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>>>>>>>> 697] > >>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>> [0x5561cdcd7bc7] > >>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>> [0x5561ce292e7d] > >>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Greets, > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> > >>>>>>>>>>>>> sage > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>>>>>> around the current assert: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>>>>>> set<snapid_t> snaps; > >>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> s > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 02.02.2018 um 22:05 schrieb Sage Weil: > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >> Am 02.02.2018 um 20:28 schrieb Sage Weil: >>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> some days ago i reverted to stock luminous with >>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >>>> + >>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >>>> found. >>>> >>>> this worked fine until the ceph balancer runned and started remapping >>>> some pgs. >>>> >>>> This again trigged this assert: >>>> >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>> >>>> Any idea how to fix it? >>> >>> Cheap workaround is to comment out that assert. I'm still tryign to sort >> >> Yes i did but i feel unconforable with it as the snapmapper value is >> still wrong. >> >>> out how this happened, though. Not sure if you shared this already, but: >>> - bluestore or filestore? >> >> As i first reported it only filestore. Now also some bluestore OSDs. >> >>> - no cache tiering, right? nowhere on the cluster? >> Yes and Yes. Never used cache tiering at all. >> >>> It looks like the SnapMapper entry with an empty snaps vector came in via >>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() >>> doesn't have that check). There are a few place it could come from, >>> including the on_local_recover() call and the update_snap_map() call. >>> That last one is interesting because it pulls the snaps for a new clone >>> out of the pg log entry.. maybe those aren't getting populated properly. >> >> Is it also possible that they got corrupted in the past and now ceph is >> unable to detect it and repair it with a scrub or deep scrub? >> >>> Can you confirm that all of the OSDs on the cluster are running the >>> latest? (ceph versions). >> >> Yes they do - i always check this after each update and i rechecked it >> again. > > Ok thanks! > > I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that > has those commits above plus another one that adds debug messages in the > various places that the empty snap vectors might be coming from. > > This won't help it get past the rebalance issue you hit above, but > hopefully it will trigger when you create a new clone so we can see how > this is happening. > > If we need to, I think we can ignore the assert (by commenting it out), > but one downside there is that we'll see my warnings pop up when > recovering those objects so we won't be able to tell where they really > came from. Since you were seeing this on new snaps/clones, though, I'm > hoping we won't end up in that situation! compiling - it's late here. Will update tomorrow. Thanks again! > Thanks! > sage > > > > > >> Stefan >> >> >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>> telling me: >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>> on pg 3.33e oid >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>> mapper: , oi: b0d7c...repaired >>>>>> >>>>>> Currently it seems it does an endless repair and never ends. >>>>> >>>>> On many objects or the same object? >>>>> >>>>> sage >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>> it's OK to skip the assert. >>>>>>>> >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>> needs to get killed to proceed. >>>>>>> >>>>>>> I believe this will fix it: >>>>>>> >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>> >>>>>>> Cherry-pick that and let me know? >>>>>>> >>>>>>> Thanks! >>>>>>> sage >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The log output is always: >>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>> (Aborted) ** >>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561d87138bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561d876f42bd] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> --- logging levels --- >>>>>>>> 0/ 5 none >>>>>>>> 0/ 0 lockdep >>>>>>>> 0/ 0 context >>>>>>>> 0/ 0 crush >>>>>>>> 0/ 0 mds >>>>>>>> 0/ 0 mds_balancer >>>>>>>> 0/ 0 mds_locker >>>>>>>> 0/ 0 mds_log >>>>>>>> 0/ 0 mds_log_expire >>>>>>>> 0/ 0 mds_migrator >>>>>>>> 0/ 0 buffer >>>>>>>> 0/ 0 timer >>>>>>>> 0/ 0 filer >>>>>>>> 0/ 1 striper >>>>>>>> 0/ 0 objecter >>>>>>>> 0/ 0 rados >>>>>>>> 0/ 0 rbd >>>>>>>> 0/ 5 rbd_mirror >>>>>>>> 0/ 5 rbd_replay >>>>>>>> 0/ 0 journaler >>>>>>>> 0/ 0 objectcacher >>>>>>>> 0/ 0 client >>>>>>>> 0/ 0 osd >>>>>>>> 0/ 0 optracker >>>>>>>> 0/ 0 objclass >>>>>>>> 0/ 0 filestore >>>>>>>> 0/ 0 journal >>>>>>>> 0/ 0 ms >>>>>>>> 0/ 0 mon >>>>>>>> 0/ 0 monc >>>>>>>> 0/ 0 paxos >>>>>>>> 0/ 0 tp >>>>>>>> 0/ 0 auth >>>>>>>> 1/ 5 crypto >>>>>>>> 0/ 0 finisher >>>>>>>> 1/ 1 reserver >>>>>>>> 0/ 0 heartbeatmap >>>>>>>> 0/ 0 perfcounter >>>>>>>> 0/ 0 rgw >>>>>>>> 1/10 civetweb >>>>>>>> 1/ 5 javaclient >>>>>>>> 0/ 0 asok >>>>>>>> 0/ 0 throttle >>>>>>>> 0/ 0 refs >>>>>>>> 1/ 5 xio >>>>>>>> 1/ 5 compressor >>>>>>>> 1/ 5 bluestore >>>>>>>> 1/ 5 bluefs >>>>>>>> 1/ 3 bdev >>>>>>>> 1/ 5 kstore >>>>>>>> 4/ 5 rocksdb >>>>>>>> 4/ 5 leveldb >>>>>>>> 4/ 5 memdb >>>>>>>> 1/ 5 kinetic >>>>>>>> 1/ 5 fuse >>>>>>>> 1/ 5 mgr >>>>>>>> 1/ 5 mgrc >>>>>>>> 1/ 5 dpdk >>>>>>>> 1/ 5 eventtrace >>>>>>>> -2/-2 (syslog threshold) >>>>>>>> -1/-1 (stderr threshold) >>>>>>>> max_recent 10000 >>>>>>>> max_new 1000 >>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>> --- end dump of recent events --- >>>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>> >>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>> >>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>> >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>> luminous when all this happened. >>>>>>>>> >>>>>>>>> Did this start right after the upgrade? >>>>>>>>> >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>> can provide would help! >>>>>>>>> >>>>>>>>> Thanks- >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>> >>>>>>>>>>> object_snaps out; >>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>> if (r < 0) >>>>>>>>>>> return r; >>>>>>>>>>> >>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>> >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>> can tolerate it... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> >>>>>>>>>>>> HI Sage, >>>>>>>>>>>> >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>> >>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>> >>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>> 6: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>> >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>> >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>> 697] >>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, Am 02.02.2018 um 22:05 schrieb Sage Weil: > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >> Am 02.02.2018 um 20:28 schrieb Sage Weil: >>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> some days ago i reverted to stock luminous with >>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >>>> + >>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >>>> found. >>>> >>>> this worked fine until the ceph balancer runned and started remapping >>>> some pgs. >>>> >>>> This again trigged this assert: >>>> >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>> >>>> Any idea how to fix it? >>> >>> Cheap workaround is to comment out that assert. I'm still tryign to sort >> >> Yes i did but i feel unconforable with it as the snapmapper value is >> still wrong. >> >>> out how this happened, though. Not sure if you shared this already, but: >>> - bluestore or filestore? >> >> As i first reported it only filestore. Now also some bluestore OSDs. >> >>> - no cache tiering, right? nowhere on the cluster? >> Yes and Yes. Never used cache tiering at all. >> >>> It looks like the SnapMapper entry with an empty snaps vector came in via >>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() >>> doesn't have that check). There are a few place it could come from, >>> including the on_local_recover() call and the update_snap_map() call. >>> That last one is interesting because it pulls the snaps for a new clone >>> out of the pg log entry.. maybe those aren't getting populated properly. >> >> Is it also possible that they got corrupted in the past and now ceph is >> unable to detect it and repair it with a scrub or deep scrub? >> >>> Can you confirm that all of the OSDs on the cluster are running the >>> latest? (ceph versions). >> >> Yes they do - i always check this after each update and i rechecked it >> again. > > Ok thanks! > > I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that > has those commits above plus another one that adds debug messages in the > various places that the empty snap vectors might be coming from. > > This won't help it get past the rebalance issue you hit above, but > hopefully it will trigger when you create a new clone so we can see how > this is happening. > > If we need to, I think we can ignore the assert (by commenting it out), > but one downside there is that we'll see my warnings pop up when > recovering those objects so we won't be able to tell where they really > came from. Since you were seeing this on new snaps/clones, though, I'm > hoping we won't end up in that situation! thanks. New version is running. Is looking through ceph.log enough? Or are those only printed to the corresponding osd log? Greets, Stefan > > Thanks! > sage > > > > > >> Stefan >> >> >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>> telling me: >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>> on pg 3.33e oid >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>> mapper: , oi: b0d7c...repaired >>>>>> >>>>>> Currently it seems it does an endless repair and never ends. >>>>> >>>>> On many objects or the same object? >>>>> >>>>> sage >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>> it's OK to skip the assert. >>>>>>>> >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>> needs to get killed to proceed. >>>>>>> >>>>>>> I believe this will fix it: >>>>>>> >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>> >>>>>>> Cherry-pick that and let me know? >>>>>>> >>>>>>> Thanks! >>>>>>> sage >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The log output is always: >>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>> (Aborted) ** >>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561d87138bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561d876f42bd] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> --- logging levels --- >>>>>>>> 0/ 5 none >>>>>>>> 0/ 0 lockdep >>>>>>>> 0/ 0 context >>>>>>>> 0/ 0 crush >>>>>>>> 0/ 0 mds >>>>>>>> 0/ 0 mds_balancer >>>>>>>> 0/ 0 mds_locker >>>>>>>> 0/ 0 mds_log >>>>>>>> 0/ 0 mds_log_expire >>>>>>>> 0/ 0 mds_migrator >>>>>>>> 0/ 0 buffer >>>>>>>> 0/ 0 timer >>>>>>>> 0/ 0 filer >>>>>>>> 0/ 1 striper >>>>>>>> 0/ 0 objecter >>>>>>>> 0/ 0 rados >>>>>>>> 0/ 0 rbd >>>>>>>> 0/ 5 rbd_mirror >>>>>>>> 0/ 5 rbd_replay >>>>>>>> 0/ 0 journaler >>>>>>>> 0/ 0 objectcacher >>>>>>>> 0/ 0 client >>>>>>>> 0/ 0 osd >>>>>>>> 0/ 0 optracker >>>>>>>> 0/ 0 objclass >>>>>>>> 0/ 0 filestore >>>>>>>> 0/ 0 journal >>>>>>>> 0/ 0 ms >>>>>>>> 0/ 0 mon >>>>>>>> 0/ 0 monc >>>>>>>> 0/ 0 paxos >>>>>>>> 0/ 0 tp >>>>>>>> 0/ 0 auth >>>>>>>> 1/ 5 crypto >>>>>>>> 0/ 0 finisher >>>>>>>> 1/ 1 reserver >>>>>>>> 0/ 0 heartbeatmap >>>>>>>> 0/ 0 perfcounter >>>>>>>> 0/ 0 rgw >>>>>>>> 1/10 civetweb >>>>>>>> 1/ 5 javaclient >>>>>>>> 0/ 0 asok >>>>>>>> 0/ 0 throttle >>>>>>>> 0/ 0 refs >>>>>>>> 1/ 5 xio >>>>>>>> 1/ 5 compressor >>>>>>>> 1/ 5 bluestore >>>>>>>> 1/ 5 bluefs >>>>>>>> 1/ 3 bdev >>>>>>>> 1/ 5 kstore >>>>>>>> 4/ 5 rocksdb >>>>>>>> 4/ 5 leveldb >>>>>>>> 4/ 5 memdb >>>>>>>> 1/ 5 kinetic >>>>>>>> 1/ 5 fuse >>>>>>>> 1/ 5 mgr >>>>>>>> 1/ 5 mgrc >>>>>>>> 1/ 5 dpdk >>>>>>>> 1/ 5 eventtrace >>>>>>>> -2/-2 (syslog threshold) >>>>>>>> -1/-1 (stderr threshold) >>>>>>>> max_recent 10000 >>>>>>>> max_new 1000 >>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>> --- end dump of recent events --- >>>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>> >>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>> >>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>> >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>> luminous when all this happened. >>>>>>>>> >>>>>>>>> Did this start right after the upgrade? >>>>>>>>> >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>> can provide would help! >>>>>>>>> >>>>>>>>> Thanks- >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>> >>>>>>>>>>> object_snaps out; >>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>> if (r < 0) >>>>>>>>>>> return r; >>>>>>>>>>> >>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>> >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>> can tolerate it... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> >>>>>>>>>>>> HI Sage, >>>>>>>>>>>> >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>> >>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>> >>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>> 6: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>> >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>> >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>> 697] >>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, Am 02.02.2018 um 22:05 schrieb Sage Weil: > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >> Am 02.02.2018 um 20:28 schrieb Sage Weil: >>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> some days ago i reverted to stock luminous with >>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >>>> + >>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >>>> found. >>>> >>>> this worked fine until the ceph balancer runned and started remapping >>>> some pgs. >>>> >>>> This again trigged this assert: >>>> >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>> >>>> Any idea how to fix it? >>> >>> Cheap workaround is to comment out that assert. I'm still tryign to sort >> >> Yes i did but i feel unconforable with it as the snapmapper value is >> still wrong. >> >>> out how this happened, though. Not sure if you shared this already, but: >>> - bluestore or filestore? >> >> As i first reported it only filestore. Now also some bluestore OSDs. >> >>> - no cache tiering, right? nowhere on the cluster? >> Yes and Yes. Never used cache tiering at all. >> >>> It looks like the SnapMapper entry with an empty snaps vector came in via >>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() >>> doesn't have that check). There are a few place it could come from, >>> including the on_local_recover() call and the update_snap_map() call. >>> That last one is interesting because it pulls the snaps for a new clone >>> out of the pg log entry.. maybe those aren't getting populated properly. >> >> Is it also possible that they got corrupted in the past and now ceph is >> unable to detect it and repair it with a scrub or deep scrub? >> >>> Can you confirm that all of the OSDs on the cluster are running the >>> latest? (ceph versions). >> >> Yes they do - i always check this after each update and i rechecked it >> again. > > Ok thanks! > > I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that > has those commits above plus another one that adds debug messages in the > various places that the empty snap vectors might be coming from. > > This won't help it get past the rebalance issue you hit above, but > hopefully it will trigger when you create a new clone so we can see how > this is happening. > > If we need to, I think we can ignore the assert (by commenting it out), > but one downside there is that we'll see my warnings pop up when > recovering those objects so we won't be able to tell where they really > came from. Since you were seeing this on new snaps/clones, though, I'm > hoping we won't end up in that situation! the biggest part of the logs is now this stuff: 2018-02-05 07:56:32.712366 7fb6893ff700 0 log_channel(cluster) log [DBG] : 3.740 scrub ok 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191 pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848] local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77 (have MIN) 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191 pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901] local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f (have MIN) 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55 (have MIN) 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77 (have MIN) 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5 (have MIN) 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e (have MIN) 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900] local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77 (have MIN) 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970] local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3 (have MIN) 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043] local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42 (have MIN) 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243] local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7 (have MIN) Stefan > Thanks! > sage > > > > > >> Stefan >> >> >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>> telling me: >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>> on pg 3.33e oid >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>> mapper: , oi: b0d7c...repaired >>>>>> >>>>>> Currently it seems it does an endless repair and never ends. >>>>> >>>>> On many objects or the same object? >>>>> >>>>> sage >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>> it's OK to skip the assert. >>>>>>>> >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>> needs to get killed to proceed. >>>>>>> >>>>>>> I believe this will fix it: >>>>>>> >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>> >>>>>>> Cherry-pick that and let me know? >>>>>>> >>>>>>> Thanks! >>>>>>> sage >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The log output is always: >>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>> (Aborted) ** >>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561d87138bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561d876f42bd] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> --- logging levels --- >>>>>>>> 0/ 5 none >>>>>>>> 0/ 0 lockdep >>>>>>>> 0/ 0 context >>>>>>>> 0/ 0 crush >>>>>>>> 0/ 0 mds >>>>>>>> 0/ 0 mds_balancer >>>>>>>> 0/ 0 mds_locker >>>>>>>> 0/ 0 mds_log >>>>>>>> 0/ 0 mds_log_expire >>>>>>>> 0/ 0 mds_migrator >>>>>>>> 0/ 0 buffer >>>>>>>> 0/ 0 timer >>>>>>>> 0/ 0 filer >>>>>>>> 0/ 1 striper >>>>>>>> 0/ 0 objecter >>>>>>>> 0/ 0 rados >>>>>>>> 0/ 0 rbd >>>>>>>> 0/ 5 rbd_mirror >>>>>>>> 0/ 5 rbd_replay >>>>>>>> 0/ 0 journaler >>>>>>>> 0/ 0 objectcacher >>>>>>>> 0/ 0 client >>>>>>>> 0/ 0 osd >>>>>>>> 0/ 0 optracker >>>>>>>> 0/ 0 objclass >>>>>>>> 0/ 0 filestore >>>>>>>> 0/ 0 journal >>>>>>>> 0/ 0 ms >>>>>>>> 0/ 0 mon >>>>>>>> 0/ 0 monc >>>>>>>> 0/ 0 paxos >>>>>>>> 0/ 0 tp >>>>>>>> 0/ 0 auth >>>>>>>> 1/ 5 crypto >>>>>>>> 0/ 0 finisher >>>>>>>> 1/ 1 reserver >>>>>>>> 0/ 0 heartbeatmap >>>>>>>> 0/ 0 perfcounter >>>>>>>> 0/ 0 rgw >>>>>>>> 1/10 civetweb >>>>>>>> 1/ 5 javaclient >>>>>>>> 0/ 0 asok >>>>>>>> 0/ 0 throttle >>>>>>>> 0/ 0 refs >>>>>>>> 1/ 5 xio >>>>>>>> 1/ 5 compressor >>>>>>>> 1/ 5 bluestore >>>>>>>> 1/ 5 bluefs >>>>>>>> 1/ 3 bdev >>>>>>>> 1/ 5 kstore >>>>>>>> 4/ 5 rocksdb >>>>>>>> 4/ 5 leveldb >>>>>>>> 4/ 5 memdb >>>>>>>> 1/ 5 kinetic >>>>>>>> 1/ 5 fuse >>>>>>>> 1/ 5 mgr >>>>>>>> 1/ 5 mgrc >>>>>>>> 1/ 5 dpdk >>>>>>>> 1/ 5 eventtrace >>>>>>>> -2/-2 (syslog threshold) >>>>>>>> -1/-1 (stderr threshold) >>>>>>>> max_recent 10000 >>>>>>>> max_new 1000 >>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>> --- end dump of recent events --- >>>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>> >>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>> >>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>> >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>> luminous when all this happened. >>>>>>>>> >>>>>>>>> Did this start right after the upgrade? >>>>>>>>> >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>> can provide would help! >>>>>>>>> >>>>>>>>> Thanks- >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>> >>>>>>>>>>> object_snaps out; >>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>> if (r < 0) >>>>>>>>>>> return r; >>>>>>>>>>> >>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>> >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>> can tolerate it... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> >>>>>>>>>>>> HI Sage, >>>>>>>>>>>> >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>> >>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>> >>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>> 6: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>> >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>> >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>> 697] >>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 3 Feb 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > Am 02.02.2018 um 22:05 schrieb Sage Weil: > > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > >> Am 02.02.2018 um 20:28 schrieb Sage Weil: > >>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > >>>> Hi Sage, > >>>> > >>>> some days ago i reverted to stock luminous with > >>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a > >>>> + > >>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa > >>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were > >>>> found. > >>>> > >>>> this worked fine until the ceph balancer runned and started remapping > >>>> some pgs. > >>>> > >>>> This again trigged this assert: > >>>> > >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' > >>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 > >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>> > >>>> Any idea how to fix it? > >>> > >>> Cheap workaround is to comment out that assert. I'm still tryign to sort > >> > >> Yes i did but i feel unconforable with it as the snapmapper value is > >> still wrong. > >> > >>> out how this happened, though. Not sure if you shared this already, but: > >>> - bluestore or filestore? > >> > >> As i first reported it only filestore. Now also some bluestore OSDs. > >> > >>> - no cache tiering, right? nowhere on the cluster? > >> Yes and Yes. Never used cache tiering at all. > >> > >>> It looks like the SnapMapper entry with an empty snaps vector came in via > >>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() > >>> doesn't have that check). There are a few place it could come from, > >>> including the on_local_recover() call and the update_snap_map() call. > >>> That last one is interesting because it pulls the snaps for a new clone > >>> out of the pg log entry.. maybe those aren't getting populated properly. > >> > >> Is it also possible that they got corrupted in the past and now ceph is > >> unable to detect it and repair it with a scrub or deep scrub? > >> > >>> Can you confirm that all of the OSDs on the cluster are running the > >>> latest? (ceph versions). > >> > >> Yes they do - i always check this after each update and i rechecked it > >> again. > > > > Ok thanks! > > > > I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that > > has those commits above plus another one that adds debug messages in the > > various places that the empty snap vectors might be coming from. > > > > This won't help it get past the rebalance issue you hit above, but > > hopefully it will trigger when you create a new clone so we can see how > > this is happening. > > > > If we need to, I think we can ignore the assert (by commenting it out), > > but one downside there is that we'll see my warnings pop up when > > recovering those objects so we won't be able to tell where they really > > came from. Since you were seeing this on new snaps/clones, though, I'm > > hoping we won't end up in that situation! > > thanks. New version is running. Is looking through ceph.log enough? Or > are those only printed to the corresponding osd log? They only go to the OSD log. sage > > Greets, > Stefan > > > > Thanks! > > sage > > > > > > > > > > >> Stefan > >> > >> > >>>> > >>>> Greets, > >>>> Stefan > >>>> > >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: > >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>> Hi Sage, > >>>>>> > >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops > >>>>>> telling me: > >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > >>>>>> on pg 3.33e oid > >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > >>>>>> mapper: , oi: b0d7c...repaired > >>>>>> > >>>>>> Currently it seems it does an endless repair and never ends. > >>>>> > >>>>> On many objects or the same object? > >>>>> > >>>>> sage > >>>>> > >>>>>> > >>>>>> Greets, > >>>>>> Stefan > >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: > >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>> Hi Sage, > >>>>>>>> > >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So > >>>>>>>> it's OK to skip the assert. > >>>>>>>> > >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to > >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and > >>>>>>>> needs to get killed to proceed. > >>>>>>> > >>>>>>> I believe this will fix it: > >>>>>>> > >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper > >>>>>>> > >>>>>>> Cherry-pick that and let me know? > >>>>>>> > >>>>>>> Thanks! > >>>>>>> sage > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> The log output is always: > >>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at > >>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent > >>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at > >>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent > >>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at > >>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent > >>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >>>>>>>> (Aborted) ** > >>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >>>>>>>> > >>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] > >>>>>>>> 2: (()+0xf890) [0x7f100b974890] > >>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] > >>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>> [0x561d87138bc7] > >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>> const&)+0x57) [0x561d873b0db7] > >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>> [0x561d876f42bd] > >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >>>>>>>> 15: (()+0x8064) [0x7f100b96d064] > >>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] > >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>> needed to interpret this. > >>>>>>>> > >>>>>>>> --- logging levels --- > >>>>>>>> 0/ 5 none > >>>>>>>> 0/ 0 lockdep > >>>>>>>> 0/ 0 context > >>>>>>>> 0/ 0 crush > >>>>>>>> 0/ 0 mds > >>>>>>>> 0/ 0 mds_balancer > >>>>>>>> 0/ 0 mds_locker > >>>>>>>> 0/ 0 mds_log > >>>>>>>> 0/ 0 mds_log_expire > >>>>>>>> 0/ 0 mds_migrator > >>>>>>>> 0/ 0 buffer > >>>>>>>> 0/ 0 timer > >>>>>>>> 0/ 0 filer > >>>>>>>> 0/ 1 striper > >>>>>>>> 0/ 0 objecter > >>>>>>>> 0/ 0 rados > >>>>>>>> 0/ 0 rbd > >>>>>>>> 0/ 5 rbd_mirror > >>>>>>>> 0/ 5 rbd_replay > >>>>>>>> 0/ 0 journaler > >>>>>>>> 0/ 0 objectcacher > >>>>>>>> 0/ 0 client > >>>>>>>> 0/ 0 osd > >>>>>>>> 0/ 0 optracker > >>>>>>>> 0/ 0 objclass > >>>>>>>> 0/ 0 filestore > >>>>>>>> 0/ 0 journal > >>>>>>>> 0/ 0 ms > >>>>>>>> 0/ 0 mon > >>>>>>>> 0/ 0 monc > >>>>>>>> 0/ 0 paxos > >>>>>>>> 0/ 0 tp > >>>>>>>> 0/ 0 auth > >>>>>>>> 1/ 5 crypto > >>>>>>>> 0/ 0 finisher > >>>>>>>> 1/ 1 reserver > >>>>>>>> 0/ 0 heartbeatmap > >>>>>>>> 0/ 0 perfcounter > >>>>>>>> 0/ 0 rgw > >>>>>>>> 1/10 civetweb > >>>>>>>> 1/ 5 javaclient > >>>>>>>> 0/ 0 asok > >>>>>>>> 0/ 0 throttle > >>>>>>>> 0/ 0 refs > >>>>>>>> 1/ 5 xio > >>>>>>>> 1/ 5 compressor > >>>>>>>> 1/ 5 bluestore > >>>>>>>> 1/ 5 bluefs > >>>>>>>> 1/ 3 bdev > >>>>>>>> 1/ 5 kstore > >>>>>>>> 4/ 5 rocksdb > >>>>>>>> 4/ 5 leveldb > >>>>>>>> 4/ 5 memdb > >>>>>>>> 1/ 5 kinetic > >>>>>>>> 1/ 5 fuse > >>>>>>>> 1/ 5 mgr > >>>>>>>> 1/ 5 mgrc > >>>>>>>> 1/ 5 dpdk > >>>>>>>> 1/ 5 eventtrace > >>>>>>>> -2/-2 (syslog threshold) > >>>>>>>> -1/-1 (stderr threshold) > >>>>>>>> max_recent 10000 > >>>>>>>> max_new 1000 > >>>>>>>> log_file /var/log/ceph/ceph-osd.59.log > >>>>>>>> --- end dump of recent events --- > >>>>>>>> > >>>>>>>> > >>>>>>>> Greets, > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> Hi Sage, > >>>>>>>>>> > >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is > >>>>>>>>>> the following and i hope you might have an idea: > >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>>>>>>>> running luminous? How can they be missing in snapmapper again? > >>>>>>>>> > >>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>>>>>>>> that when you hit the crash we have some context as to what is going on. > >>>>>>>>> > >>>>>>>>> Did you track down which part of get_snaps() is returning the error? > >>>>>>>>> > >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>>>>>>>> luminous when all this happened. > >>>>>>>>> > >>>>>>>>> Did this start right after the upgrade? > >>>>>>>>> > >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up > >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), > >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you > >>>>>>>>> can provide would help! > >>>>>>>>> > >>>>>>>>> Thanks- > >>>>>>>>> sage > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Greets, > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>>>>>>>> I'm guessing it is coming from > >>>>>>>>>>> > >>>>>>>>>>> object_snaps out; > >>>>>>>>>>> int r = get_snaps(oid, &out); > >>>>>>>>>>> if (r < 0) > >>>>>>>>>>> return r; > >>>>>>>>>>> > >>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return > >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>>>>>>>> > >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>>>>>>>> can tolerate it... > >>>>>>>>>>> > >>>>>>>>>>> sage > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>> > >>>>>>>>>>>> HI Sage, > >>>>>>>>>>>> > >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>>>>>>>> > >>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>>>>>>>> > >>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>>>>>>>> > >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>> [0x55c00fdf6642] > >>>>>>>>>>>> 6: > >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>> [0x55c00fdfc684] > >>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>> [0x55c00fd22030] > >>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>> [0x55c00fb1abc7] > >>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>> [0x55c0100d5ffd] > >>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> > >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>>>>>>>> const hobject_t &oid, > >>>>>>>>>>>>>>>> object_snaps *out) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> assert(check(oid)); > >>>>>>>>>>>>>>>> set<string> keys; > >>>>>>>>>>>>>>>> map<string, bufferlist> got; > >>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>>>>>> return r; > >>>>>>>>>>>>>>>> if (got.empty()) > >>>>>>>>>>>>>>>> return -ENOENT; > >>>>>>>>>>>>>>>> if (out) { > >>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> return 0; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> is it save to comment that assert? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>>>>>>>> can't repair. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>>>>>>>> > >>>>>>>>>>>>> sage > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>>>>>>>>>> 697] > >>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>> [0x5561cdcd7bc7] > >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>> [0x5561ce292e7d] > >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>>>>>>>> around the current assert: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>>>>>>>> set<snapid_t> snaps; > >>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> s > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, Am 02.02.2018 um 22:05 schrieb Sage Weil: > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >> Am 02.02.2018 um 20:28 schrieb Sage Weil: >>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> some days ago i reverted to stock luminous with >>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >>>> + >>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >>>> found. >>>> >>>> this worked fine until the ceph balancer runned and started remapping >>>> some pgs. >>>> >>>> This again trigged this assert: >>>> >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>> >>>> Any idea how to fix it? >>> >>> Cheap workaround is to comment out that assert. I'm still tryign to sort >> >> Yes i did but i feel unconforable with it as the snapmapper value is >> still wrong. >> >>> out how this happened, though. Not sure if you shared this already, but: >>> - bluestore or filestore? >> >> As i first reported it only filestore. Now also some bluestore OSDs. >> >>> - no cache tiering, right? nowhere on the cluster? >> Yes and Yes. Never used cache tiering at all. >> >>> It looks like the SnapMapper entry with an empty snaps vector came in via >>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() >>> doesn't have that check). There are a few place it could come from, >>> including the on_local_recover() call and the update_snap_map() call. >>> That last one is interesting because it pulls the snaps for a new clone >>> out of the pg log entry.. maybe those aren't getting populated properly. >> >> Is it also possible that they got corrupted in the past and now ceph is >> unable to detect it and repair it with a scrub or deep scrub? >> >>> Can you confirm that all of the OSDs on the cluster are running the >>> latest? (ceph versions). >> >> Yes they do - i always check this after each update and i rechecked it >> again. > > Ok thanks! > > I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that > has those commits above plus another one that adds debug messages in the > various places that the empty snap vectors might be coming from. > > This won't help it get past the rebalance issue you hit above, but > hopefully it will trigger when you create a new clone so we can see how > this is happening. > > If we need to, I think we can ignore the assert (by commenting it out), > but one downside there is that we'll see my warnings pop up when > recovering those objects so we won't be able to tell where they really > came from. Since you were seeing this on new snaps/clones, though, I'm > hoping we won't end up in that situation! the biggest part of the logs is now this stuff: 2018-02-05 07:56:32.712366 7fb6893ff700 0 log_channel(cluster) log [DBG] : 3.740 scrub ok 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191 pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848] local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77 (have MIN) 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191 pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901] local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f (have MIN) 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55 (have MIN) 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77 (have MIN) 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5 (have MIN) 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e (have MIN) 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900] local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77 (have MIN) 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970] local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3 (have MIN) 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043] local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42 (have MIN) 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191 pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243] local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7 (have MIN) Found nothing else in the looks until now. Should i search for more OSDs or is this already something we should consider? Stefan > Thanks! > sage > > > > > >> Stefan >> >> >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>> telling me: >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>> on pg 3.33e oid >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>> mapper: , oi: b0d7c...repaired >>>>>> >>>>>> Currently it seems it does an endless repair and never ends. >>>>> >>>>> On many objects or the same object? >>>>> >>>>> sage >>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>> it's OK to skip the assert. >>>>>>>> >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>> needs to get killed to proceed. >>>>>>> >>>>>>> I believe this will fix it: >>>>>>> >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>> >>>>>>> Cherry-pick that and let me know? >>>>>>> >>>>>>> Thanks! >>>>>>> sage >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The log output is always: >>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>> (Aborted) ** >>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>> >>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>> [0x561d87138bc7] >>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>> [0x561d876f42bd] >>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>> needed to interpret this. >>>>>>>> >>>>>>>> --- logging levels --- >>>>>>>> 0/ 5 none >>>>>>>> 0/ 0 lockdep >>>>>>>> 0/ 0 context >>>>>>>> 0/ 0 crush >>>>>>>> 0/ 0 mds >>>>>>>> 0/ 0 mds_balancer >>>>>>>> 0/ 0 mds_locker >>>>>>>> 0/ 0 mds_log >>>>>>>> 0/ 0 mds_log_expire >>>>>>>> 0/ 0 mds_migrator >>>>>>>> 0/ 0 buffer >>>>>>>> 0/ 0 timer >>>>>>>> 0/ 0 filer >>>>>>>> 0/ 1 striper >>>>>>>> 0/ 0 objecter >>>>>>>> 0/ 0 rados >>>>>>>> 0/ 0 rbd >>>>>>>> 0/ 5 rbd_mirror >>>>>>>> 0/ 5 rbd_replay >>>>>>>> 0/ 0 journaler >>>>>>>> 0/ 0 objectcacher >>>>>>>> 0/ 0 client >>>>>>>> 0/ 0 osd >>>>>>>> 0/ 0 optracker >>>>>>>> 0/ 0 objclass >>>>>>>> 0/ 0 filestore >>>>>>>> 0/ 0 journal >>>>>>>> 0/ 0 ms >>>>>>>> 0/ 0 mon >>>>>>>> 0/ 0 monc >>>>>>>> 0/ 0 paxos >>>>>>>> 0/ 0 tp >>>>>>>> 0/ 0 auth >>>>>>>> 1/ 5 crypto >>>>>>>> 0/ 0 finisher >>>>>>>> 1/ 1 reserver >>>>>>>> 0/ 0 heartbeatmap >>>>>>>> 0/ 0 perfcounter >>>>>>>> 0/ 0 rgw >>>>>>>> 1/10 civetweb >>>>>>>> 1/ 5 javaclient >>>>>>>> 0/ 0 asok >>>>>>>> 0/ 0 throttle >>>>>>>> 0/ 0 refs >>>>>>>> 1/ 5 xio >>>>>>>> 1/ 5 compressor >>>>>>>> 1/ 5 bluestore >>>>>>>> 1/ 5 bluefs >>>>>>>> 1/ 3 bdev >>>>>>>> 1/ 5 kstore >>>>>>>> 4/ 5 rocksdb >>>>>>>> 4/ 5 leveldb >>>>>>>> 4/ 5 memdb >>>>>>>> 1/ 5 kinetic >>>>>>>> 1/ 5 fuse >>>>>>>> 1/ 5 mgr >>>>>>>> 1/ 5 mgrc >>>>>>>> 1/ 5 dpdk >>>>>>>> 1/ 5 eventtrace >>>>>>>> -2/-2 (syslog threshold) >>>>>>>> -1/-1 (stderr threshold) >>>>>>>> max_recent 10000 >>>>>>>> max_new 1000 >>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>> --- end dump of recent events --- >>>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hi Sage, >>>>>>>>>> >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>> >>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>> >>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>> >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>> luminous when all this happened. >>>>>>>>> >>>>>>>>> Did this start right after the upgrade? >>>>>>>>> >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>> can provide would help! >>>>>>>>> >>>>>>>>> Thanks- >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>> >>>>>>>>>>> object_snaps out; >>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>> if (r < 0) >>>>>>>>>>> return r; >>>>>>>>>>> >>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>> >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>> can tolerate it... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> >>>>>>>>>>>> HI Sage, >>>>>>>>>>>> >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>> >>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>> >>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>> 6: >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>> >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>> >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>> 697] >>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sage, did you had any time to look at the logs? Stefan Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG: > Hi Sage, > > Am 02.02.2018 um 22:05 schrieb Sage Weil: >> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>> Am 02.02.2018 um 20:28 schrieb Sage Weil: >>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>>> Hi Sage, >>>>> >>>>> some days ago i reverted to stock luminous with >>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >>>>> + >>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >>>>> found. >>>>> >>>>> this worked fine until the ceph balancer runned and started remapping >>>>> some pgs. >>>>> >>>>> This again trigged this assert: >>>>> >>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>> >>>>> Any idea how to fix it? >>>> >>>> Cheap workaround is to comment out that assert. I'm still tryign to sort >>> >>> Yes i did but i feel unconforable with it as the snapmapper value is >>> still wrong. >>> >>>> out how this happened, though. Not sure if you shared this already, but: >>>> - bluestore or filestore? >>> >>> As i first reported it only filestore. Now also some bluestore OSDs. >>> >>>> - no cache tiering, right? nowhere on the cluster? >>> Yes and Yes. Never used cache tiering at all. >>> >>>> It looks like the SnapMapper entry with an empty snaps vector came in via >>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() >>>> doesn't have that check). There are a few place it could come from, >>>> including the on_local_recover() call and the update_snap_map() call. >>>> That last one is interesting because it pulls the snaps for a new clone >>>> out of the pg log entry.. maybe those aren't getting populated properly. >>> >>> Is it also possible that they got corrupted in the past and now ceph is >>> unable to detect it and repair it with a scrub or deep scrub? >>> >>>> Can you confirm that all of the OSDs on the cluster are running the >>>> latest? (ceph versions). >>> >>> Yes they do - i always check this after each update and i rechecked it >>> again. >> >> Ok thanks! >> >> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that >> has those commits above plus another one that adds debug messages in the >> various places that the empty snap vectors might be coming from. >> >> This won't help it get past the rebalance issue you hit above, but >> hopefully it will trigger when you create a new clone so we can see how >> this is happening. >> >> If we need to, I think we can ignore the assert (by commenting it out), >> but one downside there is that we'll see my warnings pop up when >> recovering those objects so we won't be able to tell where they really >> came from. Since you were seeing this on new snaps/clones, though, I'm >> hoping we won't end up in that situation! > > the biggest part of the logs is now this stuff: > 2018-02-05 07:56:32.712366 7fb6893ff700 0 log_channel(cluster) log > [DBG] : 3.740 scrub ok > 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191 > pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848] > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 > luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no > head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77 > (have MIN) > 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191 > pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901] > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 > luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no > head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f > (have MIN) > 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546] > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no > head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55 > (have MIN) > 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77 > (have MIN) > 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5 > (have MIN) > 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e > (have MIN) > 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900] > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps > no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77 > (have MIN) > 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191 > pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970] > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps > no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3 > (have MIN) > 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191 > pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043] > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps > no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42 > (have MIN) > 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243] > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps > no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7 > (have MIN) > > Found nothing else in the looks until now. Should i search for more OSDs > or is this already something we should consider? > > Stefan > >> Thanks! >> sage >> >> >> >> > >>> Stefan >>> >>> >>>>> >>>>> Greets, >>>>> Stefan >>>>> >>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>> Hi Sage, >>>>>>> >>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>>> telling me: >>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>>> on pg 3.33e oid >>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>>> mapper: , oi: b0d7c...repaired >>>>>>> >>>>>>> Currently it seems it does an endless repair and never ends. >>>>>> >>>>>> On many objects or the same object? >>>>>> >>>>>> sage >>>>>> >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>> Hi Sage, >>>>>>>>> >>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>>> it's OK to skip the assert. >>>>>>>>> >>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>>> needs to get killed to proceed. >>>>>>>> >>>>>>>> I believe this will fix it: >>>>>>>> >>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>>> >>>>>>>> Cherry-pick that and let me know? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> sage >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> The log output is always: >>>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>>> (Aborted) ** >>>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>>> >>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>> [0x561d87138bc7] >>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>> [0x561d876f42bd] >>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>> needed to interpret this. >>>>>>>>> >>>>>>>>> --- logging levels --- >>>>>>>>> 0/ 5 none >>>>>>>>> 0/ 0 lockdep >>>>>>>>> 0/ 0 context >>>>>>>>> 0/ 0 crush >>>>>>>>> 0/ 0 mds >>>>>>>>> 0/ 0 mds_balancer >>>>>>>>> 0/ 0 mds_locker >>>>>>>>> 0/ 0 mds_log >>>>>>>>> 0/ 0 mds_log_expire >>>>>>>>> 0/ 0 mds_migrator >>>>>>>>> 0/ 0 buffer >>>>>>>>> 0/ 0 timer >>>>>>>>> 0/ 0 filer >>>>>>>>> 0/ 1 striper >>>>>>>>> 0/ 0 objecter >>>>>>>>> 0/ 0 rados >>>>>>>>> 0/ 0 rbd >>>>>>>>> 0/ 5 rbd_mirror >>>>>>>>> 0/ 5 rbd_replay >>>>>>>>> 0/ 0 journaler >>>>>>>>> 0/ 0 objectcacher >>>>>>>>> 0/ 0 client >>>>>>>>> 0/ 0 osd >>>>>>>>> 0/ 0 optracker >>>>>>>>> 0/ 0 objclass >>>>>>>>> 0/ 0 filestore >>>>>>>>> 0/ 0 journal >>>>>>>>> 0/ 0 ms >>>>>>>>> 0/ 0 mon >>>>>>>>> 0/ 0 monc >>>>>>>>> 0/ 0 paxos >>>>>>>>> 0/ 0 tp >>>>>>>>> 0/ 0 auth >>>>>>>>> 1/ 5 crypto >>>>>>>>> 0/ 0 finisher >>>>>>>>> 1/ 1 reserver >>>>>>>>> 0/ 0 heartbeatmap >>>>>>>>> 0/ 0 perfcounter >>>>>>>>> 0/ 0 rgw >>>>>>>>> 1/10 civetweb >>>>>>>>> 1/ 5 javaclient >>>>>>>>> 0/ 0 asok >>>>>>>>> 0/ 0 throttle >>>>>>>>> 0/ 0 refs >>>>>>>>> 1/ 5 xio >>>>>>>>> 1/ 5 compressor >>>>>>>>> 1/ 5 bluestore >>>>>>>>> 1/ 5 bluefs >>>>>>>>> 1/ 3 bdev >>>>>>>>> 1/ 5 kstore >>>>>>>>> 4/ 5 rocksdb >>>>>>>>> 4/ 5 leveldb >>>>>>>>> 4/ 5 memdb >>>>>>>>> 1/ 5 kinetic >>>>>>>>> 1/ 5 fuse >>>>>>>>> 1/ 5 mgr >>>>>>>>> 1/ 5 mgrc >>>>>>>>> 1/ 5 dpdk >>>>>>>>> 1/ 5 eventtrace >>>>>>>>> -2/-2 (syslog threshold) >>>>>>>>> -1/-1 (stderr threshold) >>>>>>>>> max_recent 10000 >>>>>>>>> max_new 1000 >>>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>>> --- end dump of recent events --- >>>>>>>>> >>>>>>>>> >>>>>>>>> Greets, >>>>>>>>> Stefan >>>>>>>>> >>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> Hi Sage, >>>>>>>>>>> >>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>>> >>>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>>> >>>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>>> >>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>>> luminous when all this happened. >>>>>>>>>> >>>>>>>>>> Did this start right after the upgrade? >>>>>>>>>> >>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>>> can provide would help! >>>>>>>>>> >>>>>>>>>> Thanks- >>>>>>>>>> sage >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Greets, >>>>>>>>>>> Stefan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>>> >>>>>>>>>>>> object_snaps out; >>>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>>> if (r < 0) >>>>>>>>>>>> return r; >>>>>>>>>>>> >>>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>>> >>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>>> can tolerate it... >>>>>>>>>>>> >>>>>>>>>>>> sage >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> >>>>>>>>>>>>> HI Sage, >>>>>>>>>>>>> >>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>>> >>>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>>> >>>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>>> >>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>>> 6: >>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Stefan >>>>>>>>>>>>> >>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>>> >>>>>>>>>>>>>> sage >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>>> 697] >>>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 12 Feb 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > did you had any time to look at the logs? Sorry, I thought I replied earlier. See below > Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG: > > Hi Sage, > > > > Am 02.02.2018 um 22:05 schrieb Sage Weil: > >> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > >>> Am 02.02.2018 um 20:28 schrieb Sage Weil: > >>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > >>>>> Hi Sage, > >>>>> > >>>>> some days ago i reverted to stock luminous with > >>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a > >>>>> + > >>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa > >>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were > >>>>> found. > >>>>> > >>>>> this worked fine until the ceph balancer runned and started remapping > >>>>> some pgs. > >>>>> > >>>>> This again trigged this assert: > >>>>> > >>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' > >>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 > >>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>> > >>>>> Any idea how to fix it? > >>>> > >>>> Cheap workaround is to comment out that assert. I'm still tryign to sort > >>> > >>> Yes i did but i feel unconforable with it as the snapmapper value is > >>> still wrong. > >>> > >>>> out how this happened, though. Not sure if you shared this already, but: > >>>> - bluestore or filestore? > >>> > >>> As i first reported it only filestore. Now also some bluestore OSDs. > >>> > >>>> - no cache tiering, right? nowhere on the cluster? > >>> Yes and Yes. Never used cache tiering at all. > >>> > >>>> It looks like the SnapMapper entry with an empty snaps vector came in via > >>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() > >>>> doesn't have that check). There are a few place it could come from, > >>>> including the on_local_recover() call and the update_snap_map() call. > >>>> That last one is interesting because it pulls the snaps for a new clone > >>>> out of the pg log entry.. maybe those aren't getting populated properly. > >>> > >>> Is it also possible that they got corrupted in the past and now ceph is > >>> unable to detect it and repair it with a scrub or deep scrub? > >>> > >>>> Can you confirm that all of the OSDs on the cluster are running the > >>>> latest? (ceph versions). > >>> > >>> Yes they do - i always check this after each update and i rechecked it > >>> again. > >> > >> Ok thanks! > >> > >> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that > >> has those commits above plus another one that adds debug messages in the > >> various places that the empty snap vectors might be coming from. > >> > >> This won't help it get past the rebalance issue you hit above, but > >> hopefully it will trigger when you create a new clone so we can see how > >> this is happening. > >> > >> If we need to, I think we can ignore the assert (by commenting it out), > >> but one downside there is that we'll see my warnings pop up when > >> recovering those objects so we won't be able to tell where they really > >> came from. Since you were seeing this on new snaps/clones, though, I'm > >> hoping we won't end up in that situation! > > > > the biggest part of the logs is now this stuff: > > 2018-02-05 07:56:32.712366 7fb6893ff700 0 log_channel(cluster) log > > [DBG] : 3.740 scrub ok > > 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191 > > pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848] > > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f > > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 > > luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no > > head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77 > > (have MIN) > > 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191 > > pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901] > > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f > > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 > > luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no > > head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f > > (have MIN) > > 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no > > head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55 > > (have MIN) > > 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > > head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77 > > (have MIN) > > 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > > head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5 > > (have MIN) > > 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > > head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e > > (have MIN) > > 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900] > > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps > > no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77 > > (have MIN) > > 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970] > > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps > > no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3 > > (have MIN) > > 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043] > > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps > > no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42 > > (have MIN) > > 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243] > > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps > > no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7 > > (have MIN) > > > > Found nothing else in the looks until now. Should i search for more OSDs > > or is this already something we should consider? Yeah, those messages aren't super informative. What would be useful is to check for any of those (or, more importantly the other new messages that branch adds) on *new* object clones for snapshots that have just been created. (Earlier, you mentioned that the errors you were seeing referred to snapshots that had just been created, right? The key question is how those new clones are getting registered in snapmapper with an empty list of snaps.) Thanks! sage > > Stefan > > > >> Thanks! > >> sage > >> > >> > >> > >> > > >>> Stefan > >>> > >>> > >>>>> > >>>>> Greets, > >>>>> Stefan > >>>>> > >>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: > >>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>> Hi Sage, > >>>>>>> > >>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops > >>>>>>> telling me: > >>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > >>>>>>> on pg 3.33e oid > >>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > >>>>>>> mapper: , oi: b0d7c...repaired > >>>>>>> > >>>>>>> Currently it seems it does an endless repair and never ends. > >>>>>> > >>>>>> On many objects or the same object? > >>>>>> > >>>>>> sage > >>>>>> > >>>>>>> > >>>>>>> Greets, > >>>>>>> Stefan > >>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: > >>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>> Hi Sage, > >>>>>>>>> > >>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So > >>>>>>>>> it's OK to skip the assert. > >>>>>>>>> > >>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to > >>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and > >>>>>>>>> needs to get killed to proceed. > >>>>>>>> > >>>>>>>> I believe this will fix it: > >>>>>>>> > >>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper > >>>>>>>> > >>>>>>>> Cherry-pick that and let me know? > >>>>>>>> > >>>>>>>> Thanks! > >>>>>>>> sage > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> The log output is always: > >>>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at > >>>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent > >>>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at > >>>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent > >>>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at > >>>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent > >>>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >>>>>>>>> (Aborted) ** > >>>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >>>>>>>>> > >>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] > >>>>>>>>> 2: (()+0xf890) [0x7f100b974890] > >>>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >>>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran > >>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] > >>>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >>>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >>>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >>>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>> [0x561d87138bc7] > >>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>> const&)+0x57) [0x561d873b0db7] > >>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>> [0x561d876f42bd] > >>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >>>>>>>>> 15: (()+0x8064) [0x7f100b96d064] > >>>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] > >>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>> needed to interpret this. > >>>>>>>>> > >>>>>>>>> --- logging levels --- > >>>>>>>>> 0/ 5 none > >>>>>>>>> 0/ 0 lockdep > >>>>>>>>> 0/ 0 context > >>>>>>>>> 0/ 0 crush > >>>>>>>>> 0/ 0 mds > >>>>>>>>> 0/ 0 mds_balancer > >>>>>>>>> 0/ 0 mds_locker > >>>>>>>>> 0/ 0 mds_log > >>>>>>>>> 0/ 0 mds_log_expire > >>>>>>>>> 0/ 0 mds_migrator > >>>>>>>>> 0/ 0 buffer > >>>>>>>>> 0/ 0 timer > >>>>>>>>> 0/ 0 filer > >>>>>>>>> 0/ 1 striper > >>>>>>>>> 0/ 0 objecter > >>>>>>>>> 0/ 0 rados > >>>>>>>>> 0/ 0 rbd > >>>>>>>>> 0/ 5 rbd_mirror > >>>>>>>>> 0/ 5 rbd_replay > >>>>>>>>> 0/ 0 journaler > >>>>>>>>> 0/ 0 objectcacher > >>>>>>>>> 0/ 0 client > >>>>>>>>> 0/ 0 osd > >>>>>>>>> 0/ 0 optracker > >>>>>>>>> 0/ 0 objclass > >>>>>>>>> 0/ 0 filestore > >>>>>>>>> 0/ 0 journal > >>>>>>>>> 0/ 0 ms > >>>>>>>>> 0/ 0 mon > >>>>>>>>> 0/ 0 monc > >>>>>>>>> 0/ 0 paxos > >>>>>>>>> 0/ 0 tp > >>>>>>>>> 0/ 0 auth > >>>>>>>>> 1/ 5 crypto > >>>>>>>>> 0/ 0 finisher > >>>>>>>>> 1/ 1 reserver > >>>>>>>>> 0/ 0 heartbeatmap > >>>>>>>>> 0/ 0 perfcounter > >>>>>>>>> 0/ 0 rgw > >>>>>>>>> 1/10 civetweb > >>>>>>>>> 1/ 5 javaclient > >>>>>>>>> 0/ 0 asok > >>>>>>>>> 0/ 0 throttle > >>>>>>>>> 0/ 0 refs > >>>>>>>>> 1/ 5 xio > >>>>>>>>> 1/ 5 compressor > >>>>>>>>> 1/ 5 bluestore > >>>>>>>>> 1/ 5 bluefs > >>>>>>>>> 1/ 3 bdev > >>>>>>>>> 1/ 5 kstore > >>>>>>>>> 4/ 5 rocksdb > >>>>>>>>> 4/ 5 leveldb > >>>>>>>>> 4/ 5 memdb > >>>>>>>>> 1/ 5 kinetic > >>>>>>>>> 1/ 5 fuse > >>>>>>>>> 1/ 5 mgr > >>>>>>>>> 1/ 5 mgrc > >>>>>>>>> 1/ 5 dpdk > >>>>>>>>> 1/ 5 eventtrace > >>>>>>>>> -2/-2 (syslog threshold) > >>>>>>>>> -1/-1 (stderr threshold) > >>>>>>>>> max_recent 10000 > >>>>>>>>> max_new 1000 > >>>>>>>>> log_file /var/log/ceph/ceph-osd.59.log > >>>>>>>>> --- end dump of recent events --- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Greets, > >>>>>>>>> Stefan > >>>>>>>>> > >>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>> Hi Sage, > >>>>>>>>>>> > >>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is > >>>>>>>>>>> the following and i hope you might have an idea: > >>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>>>>>>>>> running luminous? How can they be missing in snapmapper again? > >>>>>>>>>> > >>>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>>>>>>>>> that when you hit the crash we have some context as to what is going on. > >>>>>>>>>> > >>>>>>>>>> Did you track down which part of get_snaps() is returning the error? > >>>>>>>>>> > >>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>>>>>>>>> luminous when all this happened. > >>>>>>>>>> > >>>>>>>>>> Did this start right after the upgrade? > >>>>>>>>>> > >>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up > >>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), > >>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you > >>>>>>>>>> can provide would help! > >>>>>>>>>> > >>>>>>>>>> Thanks- > >>>>>>>>>> sage > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Greets, > >>>>>>>>>>> Stefan > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>>>>>>>>> I'm guessing it is coming from > >>>>>>>>>>>> > >>>>>>>>>>>> object_snaps out; > >>>>>>>>>>>> int r = get_snaps(oid, &out); > >>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>> return r; > >>>>>>>>>>>> > >>>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return > >>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>>>>>>>>> > >>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>>>>>>>>> can tolerate it... > >>>>>>>>>>>> > >>>>>>>>>>>> sage > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> HI Sage, > >>>>>>>>>>>>> > >>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>>>>>>>>> > >>>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>>>>>>>>> > >>>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread > >>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>>>>>>>>> > >>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>> [0x55c00fdf6642] > >>>>>>>>>>>>> 6: > >>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>> [0x55c00fdfc684] > >>>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>> [0x55c00fd22030] > >>>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>> [0x55c00fb1abc7] > >>>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>> [0x55c0100d5ffd] > >>>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Stefan > >>>>>>>>>>>>> > >>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>>>>>>>>> const hobject_t &oid, > >>>>>>>>>>>>>>>>> object_snaps *out) > >>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>> assert(check(oid)); > >>>>>>>>>>>>>>>>> set<string> keys; > >>>>>>>>>>>>>>>>> map<string, bufferlist> got; > >>>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>>>>>>> return r; > >>>>>>>>>>>>>>>>> if (got.empty()) > >>>>>>>>>>>>>>>>> return -ENOENT; > >>>>>>>>>>>>>>>>> if (out) { > >>>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>> return 0; > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> is it save to comment that assert? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>>>>>>>>> can't repair. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>>>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> sage > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 > >>>>>>>>>>>>>>> 697] > >>>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, > >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>> [0x5561cdcd7bc7] > >>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>> [0x5561ce292e7d] > >>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>>>>>>>>> around the current assert: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>>>>>>>>> set<snapid_t> snaps; > >>>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> s > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, > >>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) > >>>>>>>>>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) > >>>>>>>>>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, > >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, > >>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, > >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, > >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, > >>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, > >>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>> > >>> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, see below. Am 12.02.2018 um 20:31 schrieb Sage Weil: > On Mon, 12 Feb 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> did you had any time to look at the logs? > > Sorry, I thought I replied earlier. See below > >> Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG: >>> Hi Sage, >>> >>> Am 02.02.2018 um 22:05 schrieb Sage Weil: >>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>>> Am 02.02.2018 um 20:28 schrieb Sage Weil: >>>>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>>>>> Hi Sage, >>>>>>> >>>>>>> some days ago i reverted to stock luminous with >>>>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >>>>>>> + >>>>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >>>>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >>>>>>> found. >>>>>>> >>>>>>> this worked fine until the ceph balancer runned and started remapping >>>>>>> some pgs. >>>>>>> >>>>>>> This again trigged this assert: >>>>>>> >>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >>>>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>> >>>>>>> Any idea how to fix it? >>>>>> >>>>>> Cheap workaround is to comment out that assert. I'm still tryign to sort >>>>> >>>>> Yes i did but i feel unconforable with it as the snapmapper value is >>>>> still wrong. >>>>> >>>>>> out how this happened, though. Not sure if you shared this already, but: >>>>>> - bluestore or filestore? >>>>> >>>>> As i first reported it only filestore. Now also some bluestore OSDs. >>>>> >>>>>> - no cache tiering, right? nowhere on the cluster? >>>>> Yes and Yes. Never used cache tiering at all. >>>>> >>>>>> It looks like the SnapMapper entry with an empty snaps vector came in via >>>>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() >>>>>> doesn't have that check). There are a few place it could come from, >>>>>> including the on_local_recover() call and the update_snap_map() call. >>>>>> That last one is interesting because it pulls the snaps for a new clone >>>>>> out of the pg log entry.. maybe those aren't getting populated properly. >>>>> >>>>> Is it also possible that they got corrupted in the past and now ceph is >>>>> unable to detect it and repair it with a scrub or deep scrub? >>>>> >>>>>> Can you confirm that all of the OSDs on the cluster are running the >>>>>> latest? (ceph versions). >>>>> >>>>> Yes they do - i always check this after each update and i rechecked it >>>>> again. >>>> >>>> Ok thanks! >>>> >>>> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that >>>> has those commits above plus another one that adds debug messages in the >>>> various places that the empty snap vectors might be coming from. >>>> >>>> This won't help it get past the rebalance issue you hit above, but >>>> hopefully it will trigger when you create a new clone so we can see how >>>> this is happening. >>>> >>>> If we need to, I think we can ignore the assert (by commenting it out), >>>> but one downside there is that we'll see my warnings pop up when >>>> recovering those objects so we won't be able to tell where they really >>>> came from. Since you were seeing this on new snaps/clones, though, I'm >>>> hoping we won't end up in that situation! >>> >>> the biggest part of the logs is now this stuff: >>> 2018-02-05 07:56:32.712366 7fb6893ff700 0 log_channel(cluster) log >>> [DBG] : 3.740 scrub ok >>> 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848] >>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f >>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 >>> luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no >>> head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77 >>> (have MIN) >>> 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191 >>> pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901] >>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f >>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 >>> luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no >>> head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f >>> (have MIN) >>> 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no >>> head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55 >>> (have MIN) >>> 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no >>> head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77 >>> (have MIN) >>> 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no >>> head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5 >>> (have MIN) >>> 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no >>> head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e >>> (have MIN) >>> 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900] >>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps >>> no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77 >>> (have MIN) >>> 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970] >>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps >>> no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3 >>> (have MIN) >>> 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043] >>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps >>> no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42 >>> (have MIN) >>> 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243] >>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps >>> no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7 >>> (have MIN) >>> >>> Found nothing else in the looks until now. Should i search for more OSDs >>> or is this already something we should consider? > > Yeah, those messages aren't super informative. What would be useful is to > check for any of those (or, more importantly the other new messages that > branch adds) on *new* object clones for snapshots that have just been > created. (Earlier, you mentioned that the errors you were seeing referred > to snapshots that had just been created, right? The key question is how > those new clones are getting registered in snapmapper with an empty list > of snaps.) Yes i checked this twice - but i cannot find anything in the logs now. I checked the logs with: grep 'snap' /var/log/ceph/ceph-osd.[0-9]*.log | grep -v ' _scan_snaps no head for ' and the output is just: /var/log/ceph/ceph-osd.2.log:2018-02-12 07:44:29.169600 7f9f6abff700 -1 osd.2 pg_epoch: 949525 pg[3.7ae( v 949525'24972392 (949524'24970891,949525'24972392] local-lis/les=946216/946218 n=1608 ec=15/15 lis/c 946216/946216 les/c/f 946218/946223/937357 946216/946216/945695) [2,26,43] r=0 lpr=946216 luod=949524'24972391 lua=949524'24972391 crt=949525'24972392 lcod 949524'24972390 mlcod 949524'24972390 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.39.log:2018-02-12 07:44:28.829474 7fbaa5bff700 -1 osd.39 pg_epoch: 949525 pg[3.864( v 949525'60058835 (949524'60057284,949525'60058835] local-lis/les=944678/944679 n=1621 ec=15/15 lis/c 944678/944678 les/c/f 944679/944684/937357 944678/944678/944522) [39,80,31] r=0 lpr=944678 crt=949525'60058835 lcod 949524'60058834 mlcod 949524'60058834 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.19.log:2018-02-12 07:44:28.661329 7f5f9bfec700 -1 osd.19 pg_epoch: 949525 pg[3.d78( v 949525'32475960 (949495'32474451,949525'32475960] local-lis/les=944690/944691 n=1497 ec=15/15 lis/c 944690/944690 les/c/f 944691/944697/937357 944690/944690/944542) [19,75,78] r=0 lpr=944690 luod=949525'32475958 lua=949525'32475958 crt=949525'32475960 lcod 949524'32475957 mlcod 949524'32475957 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:28.899062 7f0282fff700 -1 osd.33 pg_epoch: 949525 pg[3.298( v 949525'82343947 (949524'82342349,949525'82343947] local-lis/les=944603/944604 n=1654 ec=15/15 lis/c 944603/944603 les/c/f 944604/944612/937357 944603/944603/944603) [33,13,18] r=0 lpr=944603 crt=949525'82343947 lcod 949524'82343946 mlcod 949524'82343946 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:29.313090 7f0281fff700 -1 osd.33 pg_epoch: 949525 pg[3.2c1( v 949525'58513621 (949524'58512073,949525'58513621] local-lis/les=944678/944679 n=1660 ec=15/15 lis/c 944678/944678 les/c/f 944679/944687/937357 944678/944678/944603) [33,80,56] r=0 lpr=944678 luod=949525'58513619 lua=949525'58513619 crt=949525'58513621 lcod 949525'58513618 mlcod 949525'58513618 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.45.log:2018-02-12 07:44:28.916617 7fea14ffd700 -1 osd.45 pg_epoch: 949525 pg[3.bf9( v 949525'71145615 (949524'71144094,949525'71145615] local-lis/les=944620/944621 n=1639 ec=15/15 lis/c 944620/944620 les/c/f 944621/944624/937357 944620/944620/944608) [45,28,58] r=0 lpr=944620 crt=949525'71145615 lcod 949525'71145614 mlcod 949525'71145614 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.696833 7f13073fe700 -1 osd.62 pg_epoch: 949525 pg[3.3e8( v 949525'30577716 (949524'30576140,949525'30577716] local-lis/les=946273/946274 n=1609 ec=15/15 lis/c 946273/946273 les/c/f 946274/946277/937357 946273/946273/944634) [62,80,37] r=0 lpr=946273 luod=949524'30577712 lua=949524'30577712 crt=949525'30577716 lcod 949524'30577711 mlcod 949524'30577711 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.801300 7f13073fe700 -1 osd.62 pg_epoch: 949525 pg[3.5d0( v 949525'90315972 (949524'90314417,949525'90315972] local-lis/les=944677/944678 n=1631 ec=15/15 lis/c 944677/944677 les/c/f 944678/944684/937357 944677/944677/944634) [62,58,88] r=0 lpr=944677 crt=949525'90315972 lcod 949524'90315971 mlcod 949524'90315971 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.81.log:2018-02-12 07:44:29.465068 7f9f9c7ff700 -1 osd.81 pg_epoch: 949525 pg[3.3cc( v 949525'46127568 (949524'46125993,949525'46127568] local-lis/les=944676/944677 n=1547 ec=15/15 lis/c 944676/944676 les/c/f 944677/944684/937357 944676/944676/944676) [81,59,66] r=0 lpr=944676 crt=949525'46127568 lcod 949525'46127567 mlcod 949525'46127567 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.91.log:2018-02-12 07:44:29.258414 7f6eebffd700 -1 osd.91 pg_epoch: 949525 pg[3.efd( v 949525'24155822 (949524'24154262,949525'24155822] local-lis/les=945695/945696 n=1645 ec=15/15 lis/c 945695/945695 les/c/f 945696/945697/937357 945695/945695/944694) [91,2,41] r=0 lpr=945695 crt=949525'24155822 lcod 949525'24155821 mlcod 949525'24155821 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head Thanks! Stefan > > Thanks! > sage > > > >>> Stefan >>> >>>> Thanks! >>>> sage >>>> >>>> >>>> >>>> > >>>>> Stefan >>>>> >>>>> >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> >>>>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>> Hi Sage, >>>>>>>>> >>>>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>>>>> telling me: >>>>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>>>>> on pg 3.33e oid >>>>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>>>>> mapper: , oi: b0d7c...repaired >>>>>>>>> >>>>>>>>> Currently it seems it does an endless repair and never ends. >>>>>>>> >>>>>>>> On many objects or the same object? >>>>>>>> >>>>>>>> sage >>>>>>>> >>>>>>>>> >>>>>>>>> Greets, >>>>>>>>> Stefan >>>>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> Hi Sage, >>>>>>>>>>> >>>>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>>>>> it's OK to skip the assert. >>>>>>>>>>> >>>>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>>>>> needs to get killed to proceed. >>>>>>>>>> >>>>>>>>>> I believe this will fix it: >>>>>>>>>> >>>>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>>>>> >>>>>>>>>> Cherry-pick that and let me know? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> sage >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The log output is always: >>>>>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>>>>> (Aborted) ** >>>>>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>>>>> >>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>> [0x561d87138bc7] >>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>> [0x561d876f42bd] >>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>> needed to interpret this. >>>>>>>>>>> >>>>>>>>>>> --- logging levels --- >>>>>>>>>>> 0/ 5 none >>>>>>>>>>> 0/ 0 lockdep >>>>>>>>>>> 0/ 0 context >>>>>>>>>>> 0/ 0 crush >>>>>>>>>>> 0/ 0 mds >>>>>>>>>>> 0/ 0 mds_balancer >>>>>>>>>>> 0/ 0 mds_locker >>>>>>>>>>> 0/ 0 mds_log >>>>>>>>>>> 0/ 0 mds_log_expire >>>>>>>>>>> 0/ 0 mds_migrator >>>>>>>>>>> 0/ 0 buffer >>>>>>>>>>> 0/ 0 timer >>>>>>>>>>> 0/ 0 filer >>>>>>>>>>> 0/ 1 striper >>>>>>>>>>> 0/ 0 objecter >>>>>>>>>>> 0/ 0 rados >>>>>>>>>>> 0/ 0 rbd >>>>>>>>>>> 0/ 5 rbd_mirror >>>>>>>>>>> 0/ 5 rbd_replay >>>>>>>>>>> 0/ 0 journaler >>>>>>>>>>> 0/ 0 objectcacher >>>>>>>>>>> 0/ 0 client >>>>>>>>>>> 0/ 0 osd >>>>>>>>>>> 0/ 0 optracker >>>>>>>>>>> 0/ 0 objclass >>>>>>>>>>> 0/ 0 filestore >>>>>>>>>>> 0/ 0 journal >>>>>>>>>>> 0/ 0 ms >>>>>>>>>>> 0/ 0 mon >>>>>>>>>>> 0/ 0 monc >>>>>>>>>>> 0/ 0 paxos >>>>>>>>>>> 0/ 0 tp >>>>>>>>>>> 0/ 0 auth >>>>>>>>>>> 1/ 5 crypto >>>>>>>>>>> 0/ 0 finisher >>>>>>>>>>> 1/ 1 reserver >>>>>>>>>>> 0/ 0 heartbeatmap >>>>>>>>>>> 0/ 0 perfcounter >>>>>>>>>>> 0/ 0 rgw >>>>>>>>>>> 1/10 civetweb >>>>>>>>>>> 1/ 5 javaclient >>>>>>>>>>> 0/ 0 asok >>>>>>>>>>> 0/ 0 throttle >>>>>>>>>>> 0/ 0 refs >>>>>>>>>>> 1/ 5 xio >>>>>>>>>>> 1/ 5 compressor >>>>>>>>>>> 1/ 5 bluestore >>>>>>>>>>> 1/ 5 bluefs >>>>>>>>>>> 1/ 3 bdev >>>>>>>>>>> 1/ 5 kstore >>>>>>>>>>> 4/ 5 rocksdb >>>>>>>>>>> 4/ 5 leveldb >>>>>>>>>>> 4/ 5 memdb >>>>>>>>>>> 1/ 5 kinetic >>>>>>>>>>> 1/ 5 fuse >>>>>>>>>>> 1/ 5 mgr >>>>>>>>>>> 1/ 5 mgrc >>>>>>>>>>> 1/ 5 dpdk >>>>>>>>>>> 1/ 5 eventtrace >>>>>>>>>>> -2/-2 (syslog threshold) >>>>>>>>>>> -1/-1 (stderr threshold) >>>>>>>>>>> max_recent 10000 >>>>>>>>>>> max_new 1000 >>>>>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>>>>> --- end dump of recent events --- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Greets, >>>>>>>>>>> Stefan >>>>>>>>>>> >>>>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>>>>> >>>>>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>>>>> >>>>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>>>>> luminous when all this happened. >>>>>>>>>>>> >>>>>>>>>>>> Did this start right after the upgrade? >>>>>>>>>>>> >>>>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>>>>> can provide would help! >>>>>>>>>>>> >>>>>>>>>>>> Thanks- >>>>>>>>>>>> sage >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Greets, >>>>>>>>>>>>> Stefan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>>>>> >>>>>>>>>>>>>> object_snaps out; >>>>>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>> return r; >>>>>>>>>>>>>> >>>>>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>>>>> can tolerate it... >>>>>>>>>>>>>> >>>>>>>>>>>>>> sage >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> HI Sage, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>>>>> 6: >>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>>>>> 697] >>>>>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc index d42f3a401b..0f76134f74 100644 --- a/src/osd/PrimaryLogPG.cc +++ b/src/osd/PrimaryLogPG.cc @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( set<snapid_t> snaps; dout(20) << " snapset " << recovery_info.ss << dendl; auto p = recovery_info.ss.clone_snaps.find(hoid.snap); - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? - snaps.insert(p->second.begin(), p->second.end()); + if (p != recovery_info.ss.clone_snaps.end()) { + derr << __func__ << " no clone_snaps for " << hoid << dendl; + } else { + snaps.insert(p->second.begin(), p->second.end()); + } dout(20) << " snaps " << snaps << dendl; snap_mapper.add_oid( recovery_info.soid,