diff mbox

Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)

Message ID alpine.DEB.2.11.1801171856230.24931@piezo.novalocal (mailing list archive)
State New, archived
Headers show

Commit Message

Sage Weil Jan. 17, 2018, 6:56 p.m. UTC
On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> 
> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> > On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> this gives me another crash while that pg is recovering:
> >>
> >>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >> PrimaryLogPG::on_l
> >> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >> me 2018-01-17 19:25:09.322287
> >> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >> recovery_info.ss.clone_snaps.end())
> > 
> > Is this a cache tiering pool?
> 
> no normal 3 replica but the pg is degraded:
> ceph pg dump | grep 3.80e
> 
> 3.80e      1709                  0     1579         0       0 6183674368
> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>        50       [39]             39  907737'69776430 2018-01-14
> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864

Hrm, no real clues on teh root cause then.  Something like this will work 
around the current assert:



> 
> Stefan
> 
> > 
> > s
> >>
> >>  ceph version 12.2.2-94-g92923ef
> >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x102) [0x55addb5eb1f2]
> >>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >> const&, std::shared_ptr<ObjectContext>, bool,
> >> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >> [0x55addb30748f]
> >>  5:
> >> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >> [0x55addb317531]
> >>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >> [0x55addb23cf10]
> >>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >> [0x55addb035bc7]
> >>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >> const&)+0x57) [0x55addb2ad957]
> >>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >> [0x55addb5f0e7d]
> >>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>  13: (()+0x8064) [0x7f4955b68064]
> >>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >> needed to interpret this.
> >>
> >> Greets,
> >> Stefan
> >>
> >> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi,
> >>>>
> >>>> i there any chance to fix this instead of removing manually all the clones?
> >>>
> >>> I believe you can avoid the immediate problem and get the PG up by 
> >>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>> list mapping.
> >>>
> >>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>> state that won't trim (although from a quick look at the code it won't 
> >>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>> happens.
> >>>
> >>> I'm adding a ticket to relax these asserts for production but keep them 
> >>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>
> >>> sage
> >>>
> >>>
> >>>  > 
> >>>
> >>>> Stefan
> >>>>
> >>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>> <s.priebe@profihost.ag> wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>> being down.
> >>>>>>
> >>>>>> All of them fail with:
> >>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>
> >>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>> [0x561f9fb76f3b]
> >>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>> [0x561f9fc314b2]
> >>>>>>  7:
> >>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>> [0x561f9fc374f4]
> >>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>> [0x561f9fb5cf10]
> >>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>> [0x561f9f955bc7]
> >>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>> [0x561f9ff10e6d]
> >>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>> needed to interpret this.
> >>>>>
> >>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>
> >>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>
> >>>>> I also notice a separate email about deleting the data; I don't have
> >>>>> any experience with this but you'd probably have to export the PG
> >>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>> out of it. I see options to remove both an object and
> >>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>> them myself.
> >>>>> -Greg
> >>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Stefan Priebe - Profihost AG Jan. 17, 2018, 8:45 p.m. UTC | #1
Hi Sage,

thanks again but this results into:
     0> 2018-01-17 21:43:09.949980 7f29c9fff700 -1
/build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
std::vecto
r<pg_log_entry_t>&, ObjectStore::Transaction&)' thread 7f29c9fff700 time
2018-01-17 21:43:09.946989
/build/ceph/src/osd/PG.cc: 3395: FAILED assert(r == 0)

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5609d3a182a2]
 2: (PG::update_snap_map(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&,
ObjectStore::Transaction&)+0x257) [0x5
609d3517d07]
 3: (PG::append_log(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
ObjectStore::Transa
ction&, bool)+0x538) [0x5609d353e018]
 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_his
tory_t> const&, eversion_t const&, eversion_t const&, bool,
ObjectStore::Transaction&)+0x64) [0x5609d3632e14]
 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
[0x5609d373e572]
 6:
(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
[0x5609d37445b4]
 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
[0x5609d3669fc0]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x77b) [0x5609d35d62ab]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x5609d3462bc7]
 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x5609d36daa07]
 11: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x5609d3491d1c]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x5609d3a1df2d]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5609d3a1fef0]
 14: (()+0x8064) [0x7f2a25b6b064]
 15: (clone()+0x6d) [0x7f2a24c5f62d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.


Stefan

Am 17.01.2018 um 19:56 schrieb Sage Weil:
> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>
>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> this gives me another crash while that pg is recovering:
>>>>
>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>> PrimaryLogPG::on_l
>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>> me 2018-01-17 19:25:09.322287
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>> recovery_info.ss.clone_snaps.end())
>>>
>>> Is this a cache tiering pool?
>>
>> no normal 3 replica but the pg is degraded:
>> ceph pg dump | grep 3.80e
>>
>> 3.80e      1709                  0     1579         0       0 6183674368
>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>        50       [39]             39  907737'69776430 2018-01-14
>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> 
> Hrm, no real clues on teh root cause then.  Something like this will work 
> around the current assert:
> 
> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> index d42f3a401b..0f76134f74 100644
> --- a/src/osd/PrimaryLogPG.cc
> +++ b/src/osd/PrimaryLogPG.cc
> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>      set<snapid_t> snaps;
>      dout(20) << " snapset " << recovery_info.ss << dendl;
>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> -    snaps.insert(p->second.begin(), p->second.end());
> +    if (p != recovery_info.ss.clone_snaps.end()) {
> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> +    } else {
> +      snaps.insert(p->second.begin(), p->second.end());
> +    }
>      dout(20) << " snaps " << snaps << dendl;
>      snap_mapper.add_oid(
>        recovery_info.soid,
> 
> 
>>
>> Stefan
>>
>>>
>>> s
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>> [0x55addb30748f]
>>>>  5:
>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>> [0x55addb317531]
>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>> [0x55addb23cf10]
>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x55addb035bc7]
>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x55addb2ad957]
>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x55addb5f0e7d]
>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi,
>>>>>>
>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>
>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>> list mapping.
>>>>>
>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>> happens.
>>>>>
>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>>  > 
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>> being down.
>>>>>>>>
>>>>>>>> All of them fail with:
>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x561f9fc314b2]
>>>>>>>>  7:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x561f9fc374f4]
>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561f9f955bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>
>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>
>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>
>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>> out of it. I see options to remove both an object and
>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>> them myself.
>>>>>>> -Greg
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 17, 2018, 9:16 p.m. UTC | #2
Hi,

OK as this one is also just an asser in case a snap cannot be removed i
will comment that assert as well.

Hopefully that's it.

Greets,
Stefan
Am 17.01.2018 um 21:45 schrieb Stefan Priebe - Profihost AG:
> Hi Sage,
> 
> thanks again but this results into:
>      0> 2018-01-17 21:43:09.949980 7f29c9fff700 -1
> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> std::vecto
> r<pg_log_entry_t>&, ObjectStore::Transaction&)' thread 7f29c9fff700 time
> 2018-01-17 21:43:09.946989
> /build/ceph/src/osd/PG.cc: 3395: FAILED assert(r == 0)
> 
>  ceph version 12.2.2-94-g92923ef
> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x5609d3a182a2]
>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> std::allocator<pg_log_entry_t> > const&,
> ObjectStore::Transaction&)+0x257) [0x5
> 609d3517d07]
>  3: (PG::append_log(std::vector<pg_log_entry_t,
> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> ObjectStore::Transa
> ction&, bool)+0x538) [0x5609d353e018]
>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_his
> tory_t> const&, eversion_t const&, eversion_t const&, bool,
> ObjectStore::Transaction&)+0x64) [0x5609d3632e14]
>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> [0x5609d373e572]
>  6:
> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> [0x5609d37445b4]
>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> [0x5609d3669fc0]
>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x77b) [0x5609d35d62ab]
>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> [0x5609d3462bc7]
>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> const&)+0x57) [0x5609d36daa07]
>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x108c) [0x5609d3491d1c]
>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> [0x5609d3a1df2d]
>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5609d3a1fef0]
>  14: (()+0x8064) [0x7f2a25b6b064]
>  15: (clone()+0x6d) [0x7f2a24c5f62d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> 
> Stefan
> 
> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>
>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Sage,
>>>>>
>>>>> this gives me another crash while that pg is recovering:
>>>>>
>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>> PrimaryLogPG::on_l
>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>> me 2018-01-17 19:25:09.322287
>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>> recovery_info.ss.clone_snaps.end())
>>>>
>>>> Is this a cache tiering pool?
>>>
>>> no normal 3 replica but the pg is degraded:
>>> ceph pg dump | grep 3.80e
>>>
>>> 3.80e      1709                  0     1579         0       0 6183674368
>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>        50       [39]             39  907737'69776430 2018-01-14
>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>
>> Hrm, no real clues on teh root cause then.  Something like this will work 
>> around the current assert:
>>
>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>> index d42f3a401b..0f76134f74 100644
>> --- a/src/osd/PrimaryLogPG.cc
>> +++ b/src/osd/PrimaryLogPG.cc
>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>      set<snapid_t> snaps;
>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>> -    snaps.insert(p->second.begin(), p->second.end());
>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>> +    } else {
>> +      snaps.insert(p->second.begin(), p->second.end());
>> +    }
>>      dout(20) << " snaps " << snaps << dendl;
>>      snap_mapper.add_oid(
>>        recovery_info.soid,
>>
>>
>>>
>>> Stefan
>>>
>>>>
>>>> s
>>>>>
>>>>>  ceph version 12.2.2-94-g92923ef
>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>> [0x55addb30748f]
>>>>>  5:
>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>> [0x55addb317531]
>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>> [0x55addb23cf10]
>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>> [0x55addb035bc7]
>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>> [0x55addb5f0e7d]
>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed to interpret this.
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>>
>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>
>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>> list mapping.
>>>>>>
>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>> happens.
>>>>>>
>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>
>>>>>> sage
>>>>>>
>>>>>>
>>>>>>  > 
>>>>>>
>>>>>>> Stefan
>>>>>>>
>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>> being down.
>>>>>>>>>
>>>>>>>>> All of them fail with:
>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>
>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>  7:
>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>
>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>
>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>> them myself.
>>>>>>>> -Greg
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 17, 2018, 10:07 p.m. UTC | #3
Hi Sage,

if i set nobackfill - the cluster stays up even though some pgs have
only one replica.

The last one i'm struggling is this one:
    0> 2018-01-17 22:58:20.855380 7f8c73bff700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f8c73bff700 thread_name:tp_osd_tp

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (()+0xa43ebc) [0x56180c34cebc]
 2: (()+0xf890) [0x7f8cbfc3e890]
 3: (std::_Rb_tree_iterator<snapid_t> std::_Rb_tree<snapid_t, snapid_t,
std::_Identity<snapid_t>, std::less<snapid_t>, std::allocato
r<snapid_t>
>::_M_insert_unique_<snapid_t&>(std::_Rb_tree_const_iterator<snapid_t>,
snapid_t&)+0x40) [0x56180bee4720]
 4: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::
Transaction*)+0x114e) [0x56180bf3a72e]
 5: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x56180c0ac2cd]
 6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
[0x56180c0ac56f]
 7:
(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
[0x56180c0bc611]
 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
[0x56180bfe1ff0]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x77b) [0x56180bf4e2db]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x56180bddabc7]
 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x56180c052a37]
 12: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x56180be09d1c]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x56180c395f3d]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x56180c397f00]
 15: (()+0x8064) [0x7f8cbfc37064]
 16: (clone()+0x6d) [0x7f8cbed2b62d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Greets,
Stefan


Am 17.01.2018 um 19:56 schrieb Sage Weil:
> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>
>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> this gives me another crash while that pg is recovering:
>>>>
>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>> PrimaryLogPG::on_l
>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>> me 2018-01-17 19:25:09.322287
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>> recovery_info.ss.clone_snaps.end())
>>>
>>> Is this a cache tiering pool?
>>
>> no normal 3 replica but the pg is degraded:
>> ceph pg dump | grep 3.80e
>>
>> 3.80e      1709                  0     1579         0       0 6183674368
>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>        50       [39]             39  907737'69776430 2018-01-14
>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> 
> Hrm, no real clues on teh root cause then.  Something like this will work 
> around the current assert:
> 
> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> index d42f3a401b..0f76134f74 100644
> --- a/src/osd/PrimaryLogPG.cc
> +++ b/src/osd/PrimaryLogPG.cc
> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>      set<snapid_t> snaps;
>      dout(20) << " snapset " << recovery_info.ss << dendl;
>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> -    snaps.insert(p->second.begin(), p->second.end());
> +    if (p != recovery_info.ss.clone_snaps.end()) {
> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> +    } else {
> +      snaps.insert(p->second.begin(), p->second.end());
> +    }
>      dout(20) << " snaps " << snaps << dendl;
>      snap_mapper.add_oid(
>        recovery_info.soid,
> 
> 
>>
>> Stefan
>>
>>>
>>> s
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>> [0x55addb30748f]
>>>>  5:
>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>> [0x55addb317531]
>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>> [0x55addb23cf10]
>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x55addb035bc7]
>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x55addb2ad957]
>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x55addb5f0e7d]
>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi,
>>>>>>
>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>
>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>> list mapping.
>>>>>
>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>> happens.
>>>>>
>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>>  > 
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>> being down.
>>>>>>>>
>>>>>>>> All of them fail with:
>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x561f9fc314b2]
>>>>>>>>  7:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x561f9fc374f4]
>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561f9f955bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>
>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>
>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>
>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>> out of it. I see options to remove both an object and
>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>> them myself.
>>>>>>> -Greg
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 18, 2018, 8:08 a.m. UTC | #4
Hi,

it also crashes in (marked with HERE):

int SnapMapper::get_snaps(
  const hobject_t &oid,
  object_snaps *out)
{
  assert(check(oid));
  set<string> keys;
  map<string, bufferlist> got;
  keys.insert(to_object_key(oid));
  int r = backend.get_keys(keys, &got);
  if (r < 0)
    return r;
  if (got.empty())
    return -ENOENT;
  if (out) {
    bufferlist::iterator bp = got.begin()->second.begin();
    ::decode(*out, bp);
    dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
    assert(!out->snaps.empty());            ########### HERE ###########
  } else {
    dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
  }
  return 0;
}

is it save to comment that assert?

Stefan
Am 17.01.2018 um 19:56 schrieb Sage Weil:
> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>
>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> this gives me another crash while that pg is recovering:
>>>>
>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>> PrimaryLogPG::on_l
>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>> me 2018-01-17 19:25:09.322287
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>> recovery_info.ss.clone_snaps.end())
>>>
>>> Is this a cache tiering pool?
>>
>> no normal 3 replica but the pg is degraded:
>> ceph pg dump | grep 3.80e
>>
>> 3.80e      1709                  0     1579         0       0 6183674368
>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>        50       [39]             39  907737'69776430 2018-01-14
>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> 
> Hrm, no real clues on teh root cause then.  Something like this will work 
> around the current assert:
> 
> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> index d42f3a401b..0f76134f74 100644
> --- a/src/osd/PrimaryLogPG.cc
> +++ b/src/osd/PrimaryLogPG.cc
> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>      set<snapid_t> snaps;
>      dout(20) << " snapset " << recovery_info.ss << dendl;
>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> -    snaps.insert(p->second.begin(), p->second.end());
> +    if (p != recovery_info.ss.clone_snaps.end()) {
> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> +    } else {
> +      snaps.insert(p->second.begin(), p->second.end());
> +    }
>      dout(20) << " snaps " << snaps << dendl;
>      snap_mapper.add_oid(
>        recovery_info.soid,
> 
> 
>>
>> Stefan
>>
>>>
>>> s
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>> [0x55addb30748f]
>>>>  5:
>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>> [0x55addb317531]
>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>> [0x55addb23cf10]
>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x55addb035bc7]
>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x55addb2ad957]
>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x55addb5f0e7d]
>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi,
>>>>>>
>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>
>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>> list mapping.
>>>>>
>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>> happens.
>>>>>
>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>>  > 
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>> being down.
>>>>>>>>
>>>>>>>> All of them fail with:
>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x561f9fc314b2]
>>>>>>>>  7:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x561f9fc374f4]
>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561f9f955bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>
>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>
>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>
>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>> out of it. I see options to remove both an object and
>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>> them myself.
>>>>>>> -Greg
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 18, 2018, 12:02 p.m. UTC | #5
Hi,

OK the cluster has now fully recovered the only problem is now that i
can't enable scrubs as they fail with:
     0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
/build/ceph/src/osd/SnapMapper.cc: In function 'int
SnapMapper::get_snaps(const h
object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
2018-01-18 13:00:54.840396
/build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5561ce28d1f2]
 2: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
697]
 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x5561cdcd7bc7]
 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x5561cdf4f957]
 10: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x5561ce292e7d]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
 13: (()+0x8064) [0x7fbd70d86064]
 14: (clone()+0x6d) [0x7fbd6fe7a62d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.


Greets,
Stefan

Am 17.01.2018 um 19:56 schrieb Sage Weil:
> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>
>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> this gives me another crash while that pg is recovering:
>>>>
>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>> PrimaryLogPG::on_l
>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>> me 2018-01-17 19:25:09.322287
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>> recovery_info.ss.clone_snaps.end())
>>>
>>> Is this a cache tiering pool?
>>
>> no normal 3 replica but the pg is degraded:
>> ceph pg dump | grep 3.80e
>>
>> 3.80e      1709                  0     1579         0       0 6183674368
>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>        50       [39]             39  907737'69776430 2018-01-14
>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> 
> Hrm, no real clues on teh root cause then.  Something like this will work 
> around the current assert:
> 
> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> index d42f3a401b..0f76134f74 100644
> --- a/src/osd/PrimaryLogPG.cc
> +++ b/src/osd/PrimaryLogPG.cc
> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>      set<snapid_t> snaps;
>      dout(20) << " snapset " << recovery_info.ss << dendl;
>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> -    snaps.insert(p->second.begin(), p->second.end());
> +    if (p != recovery_info.ss.clone_snaps.end()) {
> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> +    } else {
> +      snaps.insert(p->second.begin(), p->second.end());
> +    }
>      dout(20) << " snaps " << snaps << dendl;
>      snap_mapper.add_oid(
>        recovery_info.soid,
> 
> 
>>
>> Stefan
>>
>>>
>>> s
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>> [0x55addb30748f]
>>>>  5:
>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>> [0x55addb317531]
>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>> [0x55addb23cf10]
>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x55addb035bc7]
>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x55addb2ad957]
>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x55addb5f0e7d]
>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi,
>>>>>>
>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>
>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>> list mapping.
>>>>>
>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>> happens.
>>>>>
>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>>  > 
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>> being down.
>>>>>>>>
>>>>>>>> All of them fail with:
>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x561f9fc314b2]
>>>>>>>>  7:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x561f9fc374f4]
>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561f9f955bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>
>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>
>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>
>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>> out of it. I see options to remove both an object and
>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>> them myself.
>>>>>>> -Greg
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 18, 2018, 1:16 p.m. UTC | #6
On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> it also crashes in (marked with HERE):
> 
> int SnapMapper::get_snaps(
>   const hobject_t &oid,
>   object_snaps *out)
> {
>   assert(check(oid));
>   set<string> keys;
>   map<string, bufferlist> got;
>   keys.insert(to_object_key(oid));
>   int r = backend.get_keys(keys, &got);
>   if (r < 0)
>     return r;
>   if (got.empty())
>     return -ENOENT;
>   if (out) {
>     bufferlist::iterator bp = got.begin()->second.begin();
>     ::decode(*out, bp);
>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>     assert(!out->snaps.empty());            ########### HERE ###########
>   } else {
>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>   }
>   return 0;
> }
> 
> is it save to comment that assert?

I think so.  All of these asserts are just about ensuring the snapmapper 
state is consistent, and it's only purpose is to find snaps to trim.  
Since your snapmapper metadata is clearly not consistent, there isn't a 
whole lot of risk here.  You might want to set nosnaptrim for the time 
being so you don't get a surprise if trimming kicks in.  The next step is 
probably going to be to do a scrub and see what it finds, repairs, or 
can't repair.

sage


> Stefan
> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> > On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> this gives me another crash while that pg is recovering:
> >>>>
> >>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>> PrimaryLogPG::on_l
> >>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>> me 2018-01-17 19:25:09.322287
> >>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>> recovery_info.ss.clone_snaps.end())
> >>>
> >>> Is this a cache tiering pool?
> >>
> >> no normal 3 replica but the pg is degraded:
> >> ceph pg dump | grep 3.80e
> >>
> >> 3.80e      1709                  0     1579         0       0 6183674368
> >> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>        50       [39]             39  907737'69776430 2018-01-14
> >> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> > 
> > Hrm, no real clues on teh root cause then.  Something like this will work 
> > around the current assert:
> > 
> > diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> > index d42f3a401b..0f76134f74 100644
> > --- a/src/osd/PrimaryLogPG.cc
> > +++ b/src/osd/PrimaryLogPG.cc
> > @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >      set<snapid_t> snaps;
> >      dout(20) << " snapset " << recovery_info.ss << dendl;
> >      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> > -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> > -    snaps.insert(p->second.begin(), p->second.end());
> > +    if (p != recovery_info.ss.clone_snaps.end()) {
> > +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> > +    } else {
> > +      snaps.insert(p->second.begin(), p->second.end());
> > +    }
> >      dout(20) << " snaps " << snaps << dendl;
> >      snap_mapper.add_oid(
> >        recovery_info.soid,
> > 
> > 
> >>
> >> Stefan
> >>
> >>>
> >>> s
> >>>>
> >>>>  ceph version 12.2.2-94-g92923ef
> >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>> [0x55addb30748f]
> >>>>  5:
> >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>> [0x55addb317531]
> >>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>> [0x55addb23cf10]
> >>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>> [0x55addb035bc7]
> >>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>> const&)+0x57) [0x55addb2ad957]
> >>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>> [0x55addb5f0e7d]
> >>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>> needed to interpret this.
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>>
> >>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>
> >>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>> list mapping.
> >>>>>
> >>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>> happens.
> >>>>>
> >>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>
> >>>>> sage
> >>>>>
> >>>>>
> >>>>>  > 
> >>>>>
> >>>>>> Stefan
> >>>>>>
> >>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>> being down.
> >>>>>>>>
> >>>>>>>> All of them fail with:
> >>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>
> >>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>> [0x561f9fc314b2]
> >>>>>>>>  7:
> >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>> [0x561f9fc374f4]
> >>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>> [0x561f9f955bc7]
> >>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>> needed to interpret this.
> >>>>>>>
> >>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>
> >>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>
> >>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>> out of it. I see options to remove both an object and
> >>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>> them myself.
> >>>>>>> -Greg
> >>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 18, 2018, 1:26 p.m. UTC | #7
Hi Sage,

Am 18.01.2018 um 14:16 schrieb Sage Weil:
> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> it also crashes in (marked with HERE):
>>
>> int SnapMapper::get_snaps(
>>   const hobject_t &oid,
>>   object_snaps *out)
>> {
>>   assert(check(oid));
>>   set<string> keys;
>>   map<string, bufferlist> got;
>>   keys.insert(to_object_key(oid));
>>   int r = backend.get_keys(keys, &got);
>>   if (r < 0)
>>     return r;
>>   if (got.empty())
>>     return -ENOENT;
>>   if (out) {
>>     bufferlist::iterator bp = got.begin()->second.begin();
>>     ::decode(*out, bp);
>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>     assert(!out->snaps.empty());            ########### HERE ###########
>>   } else {
>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>   }
>>   return 0;
>> }
>>
>> is it save to comment that assert?
> 
> I think so.  All of these asserts are just about ensuring the snapmapper 
> state is consistent, and it's only purpose is to find snaps to trim.  
> Since your snapmapper metadata is clearly not consistent, there isn't a 
> whole lot of risk here.  You might want to set nosnaptrim for the time 
> being so you don't get a surprise if trimming kicks in.  The next step is 
> probably going to be to do a scrub and see what it finds, repairs, or 
> can't repair.

snap trimming works fine i already trimmed and removed some snaps.

I was able to get the cluster into a state where all is backfilled. But
enabling / doing deep-scrubs results into this one:
     0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
/build/ceph/src/osd/SnapMapper.cc: In function 'int
SnapMapper::get_snaps(const h
object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
2018-01-18 13:00:54.840396
/build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5561ce28d1f2]
 2: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
697]
 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x5561cdcd7bc7]
 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x5561cdf4f957]
 10: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x5561ce292e7d]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
 13: (()+0x8064) [0x7fbd70d86064]
 14: (clone()+0x6d) [0x7fbd6fe7a62d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.


Greets,
Stefan

> sage
> 
> 
>> Stefan
>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> this gives me another crash while that pg is recovering:
>>>>>>
>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>> PrimaryLogPG::on_l
>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>> me 2018-01-17 19:25:09.322287
>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>
>>>>> Is this a cache tiering pool?
>>>>
>>>> no normal 3 replica but the pg is degraded:
>>>> ceph pg dump | grep 3.80e
>>>>
>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>
>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>> around the current assert:
>>>
>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>> index d42f3a401b..0f76134f74 100644
>>> --- a/src/osd/PrimaryLogPG.cc
>>> +++ b/src/osd/PrimaryLogPG.cc
>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>      set<snapid_t> snaps;
>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>> -    snaps.insert(p->second.begin(), p->second.end());
>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>> +    } else {
>>> +      snaps.insert(p->second.begin(), p->second.end());
>>> +    }
>>>      dout(20) << " snaps " << snaps << dendl;
>>>      snap_mapper.add_oid(
>>>        recovery_info.soid,
>>>
>>>
>>>>
>>>> Stefan
>>>>
>>>>>
>>>>> s
>>>>>>
>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>> [0x55addb30748f]
>>>>>>  5:
>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>> [0x55addb317531]
>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>> [0x55addb23cf10]
>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>> [0x55addb035bc7]
>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>> [0x55addb5f0e7d]
>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to interpret this.
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>
>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>> list mapping.
>>>>>>>
>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>> happens.
>>>>>>>
>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>  > 
>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>> being down.
>>>>>>>>>>
>>>>>>>>>> All of them fail with:
>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>  7:
>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>
>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>
>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>
>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>> them myself.
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 18, 2018, 2:24 p.m. UTC | #8
On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi,
> >>
> >> it also crashes in (marked with HERE):
> >>
> >> int SnapMapper::get_snaps(
> >>   const hobject_t &oid,
> >>   object_snaps *out)
> >> {
> >>   assert(check(oid));
> >>   set<string> keys;
> >>   map<string, bufferlist> got;
> >>   keys.insert(to_object_key(oid));
> >>   int r = backend.get_keys(keys, &got);
> >>   if (r < 0)
> >>     return r;
> >>   if (got.empty())
> >>     return -ENOENT;
> >>   if (out) {
> >>     bufferlist::iterator bp = got.begin()->second.begin();
> >>     ::decode(*out, bp);
> >>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>     assert(!out->snaps.empty());            ########### HERE ###########
> >>   } else {
> >>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>   }
> >>   return 0;
> >> }
> >>
> >> is it save to comment that assert?
> > 
> > I think so.  All of these asserts are just about ensuring the snapmapper 
> > state is consistent, and it's only purpose is to find snaps to trim.  
> > Since your snapmapper metadata is clearly not consistent, there isn't a 
> > whole lot of risk here.  You might want to set nosnaptrim for the time 
> > being so you don't get a surprise if trimming kicks in.  The next step is 
> > probably going to be to do a scrub and see what it finds, repairs, or 
> > can't repair.
> 
> snap trimming works fine i already trimmed and removed some snaps.
> 
> I was able to get the cluster into a state where all is backfilled. But
> enabling / doing deep-scrubs results into this one:
>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> SnapMapper::get_snaps(const h
> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> 2018-01-18 13:00:54.840396
> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())

I think if you switch that assert to a warning it will repair...

sage

> 
>  ceph version 12.2.2-94-g92923ef
> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x5561ce28d1f2]
>  2: (SnapMapper::get_snaps(hobject_t const&,
> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> 697]
>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> [0x5561cdcd7bc7]
>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> const&)+0x57) [0x5561cdf4f957]
>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> [0x5561ce292e7d]
>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>  13: (()+0x8064) [0x7fbd70d86064]
>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> 
> Greets,
> Stefan
> 
> > sage
> > 
> > 
> >> Stefan
> >> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>
> >>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi Sage,
> >>>>>>
> >>>>>> this gives me another crash while that pg is recovering:
> >>>>>>
> >>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>> PrimaryLogPG::on_l
> >>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>> me 2018-01-17 19:25:09.322287
> >>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>
> >>>>> Is this a cache tiering pool?
> >>>>
> >>>> no normal 3 replica but the pg is degraded:
> >>>> ceph pg dump | grep 3.80e
> >>>>
> >>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>
> >>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>> around the current assert:
> >>>
> >>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>> index d42f3a401b..0f76134f74 100644
> >>> --- a/src/osd/PrimaryLogPG.cc
> >>> +++ b/src/osd/PrimaryLogPG.cc
> >>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>      set<snapid_t> snaps;
> >>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>> -    snaps.insert(p->second.begin(), p->second.end());
> >>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>> +    } else {
> >>> +      snaps.insert(p->second.begin(), p->second.end());
> >>> +    }
> >>>      dout(20) << " snaps " << snaps << dendl;
> >>>      snap_mapper.add_oid(
> >>>        recovery_info.soid,
> >>>
> >>>
> >>>>
> >>>> Stefan
> >>>>
> >>>>>
> >>>>> s
> >>>>>>
> >>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>> [0x55addb30748f]
> >>>>>>  5:
> >>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>> [0x55addb317531]
> >>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>> [0x55addb23cf10]
> >>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>> [0x55addb035bc7]
> >>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>> [0x55addb5f0e7d]
> >>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>> needed to interpret this.
> >>>>>>
> >>>>>> Greets,
> >>>>>> Stefan
> >>>>>>
> >>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>
> >>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>> list mapping.
> >>>>>>>
> >>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>> happens.
> >>>>>>>
> >>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>
> >>>>>>> sage
> >>>>>>>
> >>>>>>>
> >>>>>>>  > 
> >>>>>>>
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>> Hello,
> >>>>>>>>>>
> >>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>> being down.
> >>>>>>>>>>
> >>>>>>>>>> All of them fail with:
> >>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>
> >>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>  7:
> >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>> needed to interpret this.
> >>>>>>>>>
> >>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>
> >>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>
> >>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>> them myself.
> >>>>>>>>> -Greg
> >>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Fedotov Jan. 18, 2018, 2:50 p.m. UTC | #9
On 1/18/2018 5:24 PM, Sage Weil wrote:
> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> it also crashes in (marked with HERE):
>>>>
>>>> int SnapMapper::get_snaps(
>>>>    const hobject_t &oid,
>>>>    object_snaps *out)
>>>> {
>>>>    assert(check(oid));
>>>>    set<string> keys;
>>>>    map<string, bufferlist> got;
>>>>    keys.insert(to_object_key(oid));
>>>>    int r = backend.get_keys(keys, &got);
>>>>    if (r < 0)
>>>>      return r;
>>>>    if (got.empty())
>>>>      return -ENOENT;
>>>>    if (out) {
>>>>      bufferlist::iterator bp = got.begin()->second.begin();
>>>>      ::decode(*out, bp);
>>>>      dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>      assert(!out->snaps.empty());            ########### HERE ###########
>>>>    } else {
>>>>      dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>    }
>>>>    return 0;
>>>> }
>>>>
>>>> is it save to comment that assert?
>>> I think so.  All of these asserts are just about ensuring the snapmapper
>>> state is consistent, and it's only purpose is to find snaps to trim.
>>> Since your snapmapper metadata is clearly not consistent, there isn't a
>>> whole lot of risk here.  You might want to set nosnaptrim for the time
>>> being so you don't get a surprise if trimming kicks in.  The next step is
>>> probably going to be to do a scrub and see what it finds, repairs, or
>>> can't repair.
>> snap trimming works fine i already trimmed and removed some snaps.
>>
>> I was able to get the cluster into a state where all is backfilled. But
>> enabling / doing deep-scrubs results into this one:
>>       0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>> SnapMapper::get_snaps(const h
>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>> 2018-01-18 13:00:54.840396
>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> I think if you switch that
We observed similar issue with that assert as well.
Removing the assert partially resolves the issue.
Another segmentation fault was observed in 
PrimaryLogPG::find_object_assert() when accessing empty snapshot list on 
object removal (specificall p->second.back() in the code below).
...
auto p = obc->ssc->snapset.clone_snaps.find(soid.snap);
assert(p != obc->ssc->snapset.clone_snaps.end());
first = p->second.back();
^^^^^^

This indicates (and confirmed by our manual omap/attrs verification) 
that both SnapMapper's and per-object snapshot lists are empty.
Current workaround for that was to verify p->second.empty() and return 
an error for the function. This still prevents from object removal 
(cache tier flush in our case) but fixed frequent OSD crashes for now.
> assert to a warning it will repair...
Removing an assert partially help
>
> sage
>
>>   ceph version 12.2.2-94-g92923ef
>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x5561ce28d1f2]
>>   2: (SnapMapper::get_snaps(hobject_t const&,
>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>   3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>> 697]
>>   4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>   5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>   6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>   7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>   8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>> [0x5561cdcd7bc7]
>>   9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x5561cdf4f957]
>>   10: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>   11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>> [0x5561ce292e7d]
>>   12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>   13: (()+0x8064) [0x7fbd70d86064]
>>   14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>
>> Greets,
>> Stefan
>>
>>> sage
>>>
>>>
>>>> Stefan
>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>
>>>>>>>>       0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>> PrimaryLogPG::on_l
>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>> Is this a cache tiering pool?
>>>>>> no normal 3 replica but the pg is degraded:
>>>>>> ceph pg dump | grep 3.80e
>>>>>>
>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>         50       [39]             39  907737'69776430 2018-01-14
>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>> Hrm, no real clues on teh root cause then.  Something like this will work
>>>>> around the current assert:
>>>>>
>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>> index d42f3a401b..0f76134f74 100644
>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>       set<snapid_t> snaps;
>>>>>       dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>       auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>> +    } else {
>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>> +    }
>>>>>       dout(20) << " snaps " << snaps << dendl;
>>>>>       snap_mapper.add_oid(
>>>>>         recovery_info.soid,
>>>>>
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>>> s
>>>>>>>>   ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>   2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>   3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>   4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>> [0x55addb30748f]
>>>>>>>>   5:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>> [0x55addb317531]
>>>>>>>>   6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x55addb23cf10]
>>>>>>>>   7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>   8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x55addb035bc7]
>>>>>>>>   9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>   10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>   11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>   12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>   13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>   14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by
>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap
>>>>>>>>> list mapping.
>>>>>>>>>
>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so
>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error
>>>>>>>>> state that won't trim (although from a quick look at the code it won't
>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them
>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   >
>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>> being down.
>>>>>>>>>>>>
>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>      0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>
>>>>>>>>>>>>   ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>   2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>   3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>   4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>   5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>   6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>   7:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>   8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>   9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>   10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>   11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>   12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>   13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>   14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>   15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>   16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>
>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>
>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>> them myself.
>>>>>>>>>>> -Greg
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 18, 2018, 8:01 p.m. UTC | #10
OK hopefully last one.

If i unset nodeep-scrub it immediatly starts around 19-22 deep scrubs
which then results in massive slow requests and even OSDs marked as
down. is there any way to limit this?

Greets,
Stefan


Am 18.01.2018 um 15:24 schrieb Sage Weil:
> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> it also crashes in (marked with HERE):
>>>>
>>>> int SnapMapper::get_snaps(
>>>>   const hobject_t &oid,
>>>>   object_snaps *out)
>>>> {
>>>>   assert(check(oid));
>>>>   set<string> keys;
>>>>   map<string, bufferlist> got;
>>>>   keys.insert(to_object_key(oid));
>>>>   int r = backend.get_keys(keys, &got);
>>>>   if (r < 0)
>>>>     return r;
>>>>   if (got.empty())
>>>>     return -ENOENT;
>>>>   if (out) {
>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>     ::decode(*out, bp);
>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>   } else {
>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>   }
>>>>   return 0;
>>>> }
>>>>
>>>> is it save to comment that assert?
>>>
>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>> can't repair.
>>
>> snap trimming works fine i already trimmed and removed some snaps.
>>
>> I was able to get the cluster into a state where all is backfilled. But
>> enabling / doing deep-scrubs results into this one:
>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>> SnapMapper::get_snaps(const h
>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>> 2018-01-18 13:00:54.840396
>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> 
> I think if you switch that assert to a warning it will repair...
> 
> sage
> 
>>
>>  ceph version 12.2.2-94-g92923ef
>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x5561ce28d1f2]
>>  2: (SnapMapper::get_snaps(hobject_t const&,
>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>> 697]
>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>> [0x5561cdcd7bc7]
>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x5561cdf4f957]
>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>> [0x5561ce292e7d]
>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>  13: (()+0x8064) [0x7fbd70d86064]
>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>
>> Greets,
>> Stefan
>>
>>> sage
>>>
>>>
>>>> Stefan
>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>
>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>> PrimaryLogPG::on_l
>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>
>>>>>>> Is this a cache tiering pool?
>>>>>>
>>>>>> no normal 3 replica but the pg is degraded:
>>>>>> ceph pg dump | grep 3.80e
>>>>>>
>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>
>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>> around the current assert:
>>>>>
>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>> index d42f3a401b..0f76134f74 100644
>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>      set<snapid_t> snaps;
>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>> +    } else {
>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>> +    }
>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>      snap_mapper.add_oid(
>>>>>        recovery_info.soid,
>>>>>
>>>>>
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>>>
>>>>>>> s
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>> [0x55addb30748f]
>>>>>>>>  5:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>> [0x55addb317531]
>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x55addb23cf10]
>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x55addb035bc7]
>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>
>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>> list mapping.
>>>>>>>>>
>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  > 
>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>> being down.
>>>>>>>>>>>>
>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>  7:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>
>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>
>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>
>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>> them myself.
>>>>>>>>>>> -Greg
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 18, 2018, 10:17 p.m. UTC | #11
mhm it doesn't seem to have something todo with deep scrubs and disk i/o
even in scrubs produce the following log and signal (there were no slow
requests or other impacts it just got aborted after the monitor set it
to down):

   -20> 2018-01-18 23:05:45.717224 7fb955fff700  0 --
10.255.0.88:6801/18362 >> 10.255.0.92:6813/25732 conn(0x7fb8f5d90000 :-1
s=STA
TE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=48 cs=1
l=0).handle_connect_reply connect got RESETSESSION
   -19> 2018-01-18 23:05:45.717289 7fb955fff700  0 --
10.255.0.88:6801/18362 >> 10.255.0.86:6813/29823 conn(0x7fb91e668000
:6801 s=S
TATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=105 cs=1
l=0).handle_connect_reply connect got RESETSESSION
   -18> 2018-01-18 23:05:48.021476 7fb92a3f6700  0 log_channel(cluster)
log [WRN] : Monitor daemon marked osd.16 down, but it is sti
ll running
   -17> 2018-01-18 23:05:48.021484 7fb92a3f6700  0 log_channel(cluster)
log [DBG] : map e924379 wrongly marked me down at e924378
     0> 2018-01-18 23:07:53.251128 7fb90cbff700 -1 *** Caught signal
(Aborted) **
 in thread 7fb90cbff700 thread_name:tp_osd_tp

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (()+0xa4433c) [0x55fa1e2c033c]
 2: (()+0xf890) [0x7fb959c00890]
 3: (pthread_cond_wait()+0xbf) [0x7fb959bfd04f]
 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&, Context*)+0x254)
[0x55fa1e07fd54]
 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
ObjectStore::Transaction&&, Context*)+0x5c) [0x55fa1ddaeb2c]
 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x55fa1de00fa7]
 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x22f) [0x55fa1de01c7f]
 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x624) [0x55fa1de025a4]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x8b6) [0x55fa1dec1766]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x55fa1dd4dbc7]
 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x55fa1dfc5eb7]
 12: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x55fa1dd7cd1c]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x55fa1e3093bd]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fa1e30b380]
 15: (()+0x8064) [0x7fb959bf9064]
 16: (clone()+0x6d) [0x7fb958ced62d]

Greets,
Stefan

Am 18.01.2018 um 15:24 schrieb Sage Weil:
> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> it also crashes in (marked with HERE):
>>>>
>>>> int SnapMapper::get_snaps(
>>>>   const hobject_t &oid,
>>>>   object_snaps *out)
>>>> {
>>>>   assert(check(oid));
>>>>   set<string> keys;
>>>>   map<string, bufferlist> got;
>>>>   keys.insert(to_object_key(oid));
>>>>   int r = backend.get_keys(keys, &got);
>>>>   if (r < 0)
>>>>     return r;
>>>>   if (got.empty())
>>>>     return -ENOENT;
>>>>   if (out) {
>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>     ::decode(*out, bp);
>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>   } else {
>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>   }
>>>>   return 0;
>>>> }
>>>>
>>>> is it save to comment that assert?
>>>
>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>> can't repair.
>>
>> snap trimming works fine i already trimmed and removed some snaps.
>>
>> I was able to get the cluster into a state where all is backfilled. But
>> enabling / doing deep-scrubs results into this one:
>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>> SnapMapper::get_snaps(const h
>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>> 2018-01-18 13:00:54.840396
>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> 
> I think if you switch that assert to a warning it will repair...
> 
> sage
> 
>>
>>  ceph version 12.2.2-94-g92923ef
>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x5561ce28d1f2]
>>  2: (SnapMapper::get_snaps(hobject_t const&,
>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>> 697]
>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>> [0x5561cdcd7bc7]
>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x5561cdf4f957]
>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>> [0x5561ce292e7d]
>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>  13: (()+0x8064) [0x7fbd70d86064]
>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>
>> Greets,
>> Stefan
>>
>>> sage
>>>
>>>
>>>> Stefan
>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>
>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>> PrimaryLogPG::on_l
>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>
>>>>>>> Is this a cache tiering pool?
>>>>>>
>>>>>> no normal 3 replica but the pg is degraded:
>>>>>> ceph pg dump | grep 3.80e
>>>>>>
>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>
>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>> around the current assert:
>>>>>
>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>> index d42f3a401b..0f76134f74 100644
>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>      set<snapid_t> snaps;
>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>> +    } else {
>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>> +    }
>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>      snap_mapper.add_oid(
>>>>>        recovery_info.soid,
>>>>>
>>>>>
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>>>
>>>>>>> s
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>> [0x55addb30748f]
>>>>>>>>  5:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>> [0x55addb317531]
>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x55addb23cf10]
>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x55addb035bc7]
>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>
>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>> list mapping.
>>>>>>>>>
>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  > 
>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>> being down.
>>>>>>>>>>>>
>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>  7:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>
>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>
>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>
>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>> them myself.
>>>>>>>>>>> -Greg
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 19, 2018, 8:16 p.m. UTC | #12
HI Sage,

any ideas for this one sawing while doing a deep scrub:

    -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)

     0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
/build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
7fbcd3bfe700 time 2018-01-19 21:14:20.474752
/build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x55c0100d0372]
 2: (PG::update_snap_map(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&,
anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
 3: (PG::append_log(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&,
boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
[0x55c00fdf6642]
 6:
(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
[0x55c00fdfc684]
 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
[0x55c00fd22030]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x55c00fb1abc7]
 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x55c00fd92ad7]
 11: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x55c0100d5ffd]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
 14: (()+0x8064) [0x7fbd284fb064]
 15: (clone()+0x6d) [0x7fbd275ef62d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.


Stefan

Am 18.01.2018 um 15:24 schrieb Sage Weil:
> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> it also crashes in (marked with HERE):
>>>>
>>>> int SnapMapper::get_snaps(
>>>>   const hobject_t &oid,
>>>>   object_snaps *out)
>>>> {
>>>>   assert(check(oid));
>>>>   set<string> keys;
>>>>   map<string, bufferlist> got;
>>>>   keys.insert(to_object_key(oid));
>>>>   int r = backend.get_keys(keys, &got);
>>>>   if (r < 0)
>>>>     return r;
>>>>   if (got.empty())
>>>>     return -ENOENT;
>>>>   if (out) {
>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>     ::decode(*out, bp);
>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>   } else {
>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>   }
>>>>   return 0;
>>>> }
>>>>
>>>> is it save to comment that assert?
>>>
>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>> can't repair.
>>
>> snap trimming works fine i already trimmed and removed some snaps.
>>
>> I was able to get the cluster into a state where all is backfilled. But
>> enabling / doing deep-scrubs results into this one:
>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>> SnapMapper::get_snaps(const h
>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>> 2018-01-18 13:00:54.840396
>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> 
> I think if you switch that assert to a warning it will repair...
> 
> sage
> 
>>
>>  ceph version 12.2.2-94-g92923ef
>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x5561ce28d1f2]
>>  2: (SnapMapper::get_snaps(hobject_t const&,
>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>> 697]
>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>> [0x5561cdcd7bc7]
>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x5561cdf4f957]
>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>> [0x5561ce292e7d]
>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>  13: (()+0x8064) [0x7fbd70d86064]
>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>
>> Greets,
>> Stefan
>>
>>> sage
>>>
>>>
>>>> Stefan
>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>
>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>> PrimaryLogPG::on_l
>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>
>>>>>>> Is this a cache tiering pool?
>>>>>>
>>>>>> no normal 3 replica but the pg is degraded:
>>>>>> ceph pg dump | grep 3.80e
>>>>>>
>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>
>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>> around the current assert:
>>>>>
>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>> index d42f3a401b..0f76134f74 100644
>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>      set<snapid_t> snaps;
>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>> +    } else {
>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>> +    }
>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>      snap_mapper.add_oid(
>>>>>        recovery_info.soid,
>>>>>
>>>>>
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>>>
>>>>>>> s
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>> [0x55addb30748f]
>>>>>>>>  5:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>> [0x55addb317531]
>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x55addb23cf10]
>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x55addb035bc7]
>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>
>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>> list mapping.
>>>>>>>>>
>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  > 
>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>> being down.
>>>>>>>>>>>>
>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>  7:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>
>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>
>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>
>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>> them myself.
>>>>>>>>>>> -Greg
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 19, 2018, 8:19 p.m. UTC | #13
I'm guessing it is coming from

  object_snaps out;
  int r = get_snaps(oid, &out);
  if (r < 0)
    return r;

in SnapMapper::update_snaps().  I would add a debug statement to to 
confirm that, and possibly also add debug lines to the various return 
points in get_snaps() so you can see why get_snaps is returning an error.

I'm guessing the way to proceed is to warn but ignore the error, but it's 
hard to say without seeing what the inconsistency there is and whether we 
can tolerate it...

sage


On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:

> HI Sage,
> 
> any ideas for this one sawing while doing a deep scrub:
> 
>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> 
>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> 
>  ceph version 12.2.2-94-g92923ef
> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x55c0100d0372]
>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> std::allocator<pg_log_entry_t> > const&,
> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>  3: (PG::append_log(std::vector<pg_log_entry_t,
> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> std::allocator<pg_log_entry_t> > const&,
> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> [0x55c00fdf6642]
>  6:
> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> [0x55c00fdfc684]
>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> [0x55c00fd22030]
>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> [0x55c00fb1abc7]
>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> const&)+0x57) [0x55c00fd92ad7]
>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> [0x55c0100d5ffd]
>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>  14: (()+0x8064) [0x7fbd284fb064]
>  15: (clone()+0x6d) [0x7fbd275ef62d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> 
> Stefan
> 
> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> > On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi,
> >>>>
> >>>> it also crashes in (marked with HERE):
> >>>>
> >>>> int SnapMapper::get_snaps(
> >>>>   const hobject_t &oid,
> >>>>   object_snaps *out)
> >>>> {
> >>>>   assert(check(oid));
> >>>>   set<string> keys;
> >>>>   map<string, bufferlist> got;
> >>>>   keys.insert(to_object_key(oid));
> >>>>   int r = backend.get_keys(keys, &got);
> >>>>   if (r < 0)
> >>>>     return r;
> >>>>   if (got.empty())
> >>>>     return -ENOENT;
> >>>>   if (out) {
> >>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>     ::decode(*out, bp);
> >>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>   } else {
> >>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>   }
> >>>>   return 0;
> >>>> }
> >>>>
> >>>> is it save to comment that assert?
> >>>
> >>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>> can't repair.
> >>
> >> snap trimming works fine i already trimmed and removed some snaps.
> >>
> >> I was able to get the cluster into a state where all is backfilled. But
> >> enabling / doing deep-scrubs results into this one:
> >>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >> SnapMapper::get_snaps(const h
> >> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >> 2018-01-18 13:00:54.840396
> >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> > 
> > I think if you switch that assert to a warning it will repair...
> > 
> > sage
> > 
> >>
> >>  ceph version 12.2.2-94-g92923ef
> >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x102) [0x5561ce28d1f2]
> >>  2: (SnapMapper::get_snaps(hobject_t const&,
> >> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >> 697]
> >>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >> [0x5561cdcd7bc7]
> >>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >> const&)+0x57) [0x5561cdf4f957]
> >>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >> [0x5561ce292e7d]
> >>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>  13: (()+0x8064) [0x7fbd70d86064]
> >>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >> needed to interpret this.
> >>
> >>
> >> Greets,
> >> Stefan
> >>
> >>> sage
> >>>
> >>>
> >>>> Stefan
> >>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>
> >>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Hi Sage,
> >>>>>>>>
> >>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>
> >>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>> PrimaryLogPG::on_l
> >>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>
> >>>>>>> Is this a cache tiering pool?
> >>>>>>
> >>>>>> no normal 3 replica but the pg is degraded:
> >>>>>> ceph pg dump | grep 3.80e
> >>>>>>
> >>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>
> >>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>> around the current assert:
> >>>>>
> >>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>> index d42f3a401b..0f76134f74 100644
> >>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>      set<snapid_t> snaps;
> >>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>> +    } else {
> >>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>> +    }
> >>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>      snap_mapper.add_oid(
> >>>>>        recovery_info.soid,
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Stefan
> >>>>>>
> >>>>>>>
> >>>>>>> s
> >>>>>>>>
> >>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>> [0x55addb30748f]
> >>>>>>>>  5:
> >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>> [0x55addb317531]
> >>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>> [0x55addb23cf10]
> >>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>> [0x55addb035bc7]
> >>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>> needed to interpret this.
> >>>>>>>>
> >>>>>>>> Greets,
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>
> >>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>> list mapping.
> >>>>>>>>>
> >>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>> happens.
> >>>>>>>>>
> >>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>
> >>>>>>>>> sage
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>  > 
> >>>>>>>>>
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>> Hello,
> >>>>>>>>>>>>
> >>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>> being down.
> >>>>>>>>>>>>
> >>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>
> >>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>  7:
> >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>
> >>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>
> >>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>
> >>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>> them myself.
> >>>>>>>>>>> -Greg
> >>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 19, 2018, 8:45 p.m. UTC | #14
Hi Sage,

thanks for your reply. I'll do this. What i still don't understand is
the following and i hope you might have an idea:
1.) All the snaps mentioned here are fresh - i created them today
running luminous? How can they be missing in snapmapper again?

2.) How did this happen? The cluster was jewel and was updated to
luminous when all this happened.

Greets,
Stefan


Am 19.01.2018 um 21:19 schrieb Sage Weil:
> I'm guessing it is coming from
> 
>   object_snaps out;
>   int r = get_snaps(oid, &out);
>   if (r < 0)
>     return r;
> 
> in SnapMapper::update_snaps().  I would add a debug statement to to 
> confirm that, and possibly also add debug lines to the various return 
> points in get_snaps() so you can see why get_snaps is returning an error.
> 
> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> hard to say without seeing what the inconsistency there is and whether we 
> can tolerate it...
> 
> sage
> 
> 
> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> 
>> HI Sage,
>>
>> any ideas for this one sawing while doing a deep scrub:
>>
>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>
>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>
>>  ceph version 12.2.2-94-g92923ef
>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x55c0100d0372]
>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>> std::allocator<pg_log_entry_t> > const&,
>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>> std::allocator<pg_log_entry_t> > const&,
>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>> [0x55c00fdf6642]
>>  6:
>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>> [0x55c00fdfc684]
>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>> [0x55c00fd22030]
>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>> [0x55c00fb1abc7]
>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x55c00fd92ad7]
>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>> [0x55c0100d5ffd]
>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>  14: (()+0x8064) [0x7fbd284fb064]
>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>
>> Stefan
>>
>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi,
>>>>>>
>>>>>> it also crashes in (marked with HERE):
>>>>>>
>>>>>> int SnapMapper::get_snaps(
>>>>>>   const hobject_t &oid,
>>>>>>   object_snaps *out)
>>>>>> {
>>>>>>   assert(check(oid));
>>>>>>   set<string> keys;
>>>>>>   map<string, bufferlist> got;
>>>>>>   keys.insert(to_object_key(oid));
>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>   if (r < 0)
>>>>>>     return r;
>>>>>>   if (got.empty())
>>>>>>     return -ENOENT;
>>>>>>   if (out) {
>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>     ::decode(*out, bp);
>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>   } else {
>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>   }
>>>>>>   return 0;
>>>>>> }
>>>>>>
>>>>>> is it save to comment that assert?
>>>>>
>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>> can't repair.
>>>>
>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>
>>>> I was able to get the cluster into a state where all is backfilled. But
>>>> enabling / doing deep-scrubs results into this one:
>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>> SnapMapper::get_snaps(const h
>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>> 2018-01-18 13:00:54.840396
>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>
>>> I think if you switch that assert to a warning it will repair...
>>>
>>> sage
>>>
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>> 697]
>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x5561cdcd7bc7]
>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x5561cdf4f957]
>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x5561ce292e7d]
>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>>> sage
>>>>>
>>>>>
>>>>>> Stefan
>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>
>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>
>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>
>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>
>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>
>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>
>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>> around the current assert:
>>>>>>>
>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>      set<snapid_t> snaps;
>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>> +    } else {
>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>> +    }
>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>      snap_mapper.add_oid(
>>>>>>>        recovery_info.soid,
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>>
>>>>>>>>> s
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>  5:
>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>
>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>> list mapping.
>>>>>>>>>>>
>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>> happens.
>>>>>>>>>>>
>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  > 
>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>
>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 21, 2018, 8:27 p.m. UTC | #15
On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> thanks for your reply. I'll do this. What i still don't understand is
> the following and i hope you might have an idea:
> 1.) All the snaps mentioned here are fresh - i created them today
> running luminous? How can they be missing in snapmapper again?

I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
that when you hit the crash we have some context as to what is going on.

Did you track down which part of get_snaps() is returning the error?

> 2.) How did this happen? The cluster was jewel and was updated to
> luminous when all this happened.

Did this start right after the upgrade?

I started a PR that relaxes some of these assertions so that they clean up 
instead (unless the debug option is enabled, for our regression testing), 
but I'm still unclear about what the inconsistency is... any loggin you 
can provide would help!

Thanks-
sage



> 
> Greets,
> Stefan
> 
> 
> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> > I'm guessing it is coming from
> > 
> >   object_snaps out;
> >   int r = get_snaps(oid, &out);
> >   if (r < 0)
> >     return r;
> > 
> > in SnapMapper::update_snaps().  I would add a debug statement to to 
> > confirm that, and possibly also add debug lines to the various return 
> > points in get_snaps() so you can see why get_snaps is returning an error.
> > 
> > I'm guessing the way to proceed is to warn but ignore the error, but it's 
> > hard to say without seeing what the inconsistency there is and whether we 
> > can tolerate it...
> > 
> > sage
> > 
> > 
> > On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> > 
> >> HI Sage,
> >>
> >> any ideas for this one sawing while doing a deep scrub:
> >>
> >>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>
> >>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>
> >>  ceph version 12.2.2-94-g92923ef
> >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x102) [0x55c0100d0372]
> >>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >> std::allocator<pg_log_entry_t> > const&,
> >> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >> std::allocator<pg_log_entry_t> > const&,
> >> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >> [0x55c00fdf6642]
> >>  6:
> >> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >> [0x55c00fdfc684]
> >>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >> [0x55c00fd22030]
> >>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >> [0x55c00fb1abc7]
> >>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >> const&)+0x57) [0x55c00fd92ad7]
> >>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >> [0x55c0100d5ffd]
> >>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>  14: (()+0x8064) [0x7fbd284fb064]
> >>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >> needed to interpret this.
> >>
> >>
> >> Stefan
> >>
> >> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> it also crashes in (marked with HERE):
> >>>>>>
> >>>>>> int SnapMapper::get_snaps(
> >>>>>>   const hobject_t &oid,
> >>>>>>   object_snaps *out)
> >>>>>> {
> >>>>>>   assert(check(oid));
> >>>>>>   set<string> keys;
> >>>>>>   map<string, bufferlist> got;
> >>>>>>   keys.insert(to_object_key(oid));
> >>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>   if (r < 0)
> >>>>>>     return r;
> >>>>>>   if (got.empty())
> >>>>>>     return -ENOENT;
> >>>>>>   if (out) {
> >>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>     ::decode(*out, bp);
> >>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>   } else {
> >>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>   }
> >>>>>>   return 0;
> >>>>>> }
> >>>>>>
> >>>>>> is it save to comment that assert?
> >>>>>
> >>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>> can't repair.
> >>>>
> >>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>
> >>>> I was able to get the cluster into a state where all is backfilled. But
> >>>> enabling / doing deep-scrubs results into this one:
> >>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>> SnapMapper::get_snaps(const h
> >>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>> 2018-01-18 13:00:54.840396
> >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>
> >>> I think if you switch that assert to a warning it will repair...
> >>>
> >>> sage
> >>>
> >>>>
> >>>>  ceph version 12.2.2-94-g92923ef
> >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>> 697]
> >>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>> [0x5561cdcd7bc7]
> >>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>> const&)+0x57) [0x5561cdf4f957]
> >>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>> [0x5561ce292e7d]
> >>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>> needed to interpret this.
> >>>>
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>>
> >>>>> sage
> >>>>>
> >>>>>
> >>>>>> Stefan
> >>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>
> >>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>> Hi Sage,
> >>>>>>>>>>
> >>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>
> >>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>
> >>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>
> >>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>
> >>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>
> >>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>> around the current assert:
> >>>>>>>
> >>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>      set<snapid_t> snaps;
> >>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>> +    } else {
> >>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>> +    }
> >>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>      snap_mapper.add_oid(
> >>>>>>>        recovery_info.soid,
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> s
> >>>>>>>>>>
> >>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>  5:
> >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>> needed to interpret this.
> >>>>>>>>>>
> >>>>>>>>>> Greets,
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>
> >>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>> list mapping.
> >>>>>>>>>>>
> >>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>> happens.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>
> >>>>>>>>>>> sage
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  > 
> >>>>>>>>>>>
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 22, 2018, 1:22 p.m. UTC | #16
Hi Sage,

get_snaps returned the error as there are no snaps in the snapmapper. So
it's OK to skip the assert.

My biggest problem is now that i'm not able to run a scrub on all PGs to
fix all those errors because in around 1-3% the osd seem to deadlock and
needs to get killed to proceed.

The log output is always:
    -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
log [WRN] : slow request 141.951820 seconds old, received at
 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
00000157:head v 927921'526559757) currently commit_sent
    -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
log [WRN] : slow request 141.643853 seconds old, received at
 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
00000147d:head v 927921'29105383) currently commit_sent
    -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
log [WRN] : slow request 141.313220 seconds old, received at
 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
000000aff:head v 927921'164552063) currently commit_sent
     0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
(Aborted) **
 in thread 7f0fc23fe700 thread_name:tp_osd_tp

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (()+0xa4423c) [0x561d876ab23c]
 2: (()+0xf890) [0x7f100b974890]
 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
saction> >&, Context*)+0x254) [0x561d8746ac54]
 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x561d87138bc7]
 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x561d873b0db7]
 12: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x561d876f42bd]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
 15: (()+0x8064) [0x7f100b96d064]
 16: (clone()+0x6d) [0x7f100aa6162d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 0 lockdep
   0/ 0 context
   0/ 0 crush
   0/ 0 mds
   0/ 0 mds_balancer
   0/ 0 mds_locker
   0/ 0 mds_log
   0/ 0 mds_log_expire
   0/ 0 mds_migrator
   0/ 0 buffer
   0/ 0 timer
   0/ 0 filer
   0/ 1 striper
   0/ 0 objecter
   0/ 0 rados
   0/ 0 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 0 journaler
   0/ 0 objectcacher
   0/ 0 client
   0/ 0 osd
   0/ 0 optracker
   0/ 0 objclass
   0/ 0 filestore
   0/ 0 journal
   0/ 0 ms
   0/ 0 mon
   0/ 0 monc
   0/ 0 paxos
   0/ 0 tp
   0/ 0 auth
   1/ 5 crypto
   0/ 0 finisher
   1/ 1 reserver
   0/ 0 heartbeatmap
   0/ 0 perfcounter
   0/ 0 rgw
   1/10 civetweb
   1/ 5 javaclient
   0/ 0 asok
   0/ 0 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.59.log
--- end dump of recent events ---


Greets,
Stefan

Am 21.01.2018 um 21:27 schrieb Sage Weil:
> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> thanks for your reply. I'll do this. What i still don't understand is
>> the following and i hope you might have an idea:
>> 1.) All the snaps mentioned here are fresh - i created them today
>> running luminous? How can they be missing in snapmapper again?
> 
> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> that when you hit the crash we have some context as to what is going on.
> 
> Did you track down which part of get_snaps() is returning the error?
> 
>> 2.) How did this happen? The cluster was jewel and was updated to
>> luminous when all this happened.
> 
> Did this start right after the upgrade?
> 
> I started a PR that relaxes some of these assertions so that they clean up 
> instead (unless the debug option is enabled, for our regression testing), 
> but I'm still unclear about what the inconsistency is... any loggin you 
> can provide would help!
> 
> Thanks-
> sage
> 
> 
> 
>>
>> Greets,
>> Stefan
>>
>>
>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>> I'm guessing it is coming from
>>>
>>>   object_snaps out;
>>>   int r = get_snaps(oid, &out);
>>>   if (r < 0)
>>>     return r;
>>>
>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>> confirm that, and possibly also add debug lines to the various return 
>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>
>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>> hard to say without seeing what the inconsistency there is and whether we 
>>> can tolerate it...
>>>
>>> sage
>>>
>>>
>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>
>>>> HI Sage,
>>>>
>>>> any ideas for this one sawing while doing a deep scrub:
>>>>
>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>
>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x55c0100d0372]
>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>> std::allocator<pg_log_entry_t> > const&,
>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>> std::allocator<pg_log_entry_t> > const&,
>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>> [0x55c00fdf6642]
>>>>  6:
>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>> [0x55c00fdfc684]
>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>> [0x55c00fd22030]
>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x55c00fb1abc7]
>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x55c0100d5ffd]
>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>>
>>>> Stefan
>>>>
>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>
>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>   const hobject_t &oid,
>>>>>>>>   object_snaps *out)
>>>>>>>> {
>>>>>>>>   assert(check(oid));
>>>>>>>>   set<string> keys;
>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>   if (r < 0)
>>>>>>>>     return r;
>>>>>>>>   if (got.empty())
>>>>>>>>     return -ENOENT;
>>>>>>>>   if (out) {
>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>     ::decode(*out, bp);
>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>   } else {
>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>   }
>>>>>>>>   return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> is it save to comment that assert?
>>>>>>>
>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>> can't repair.
>>>>>>
>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>
>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>> SnapMapper::get_snaps(const h
>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>> 2018-01-18 13:00:54.840396
>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>
>>>>> I think if you switch that assert to a warning it will repair...
>>>>>
>>>>> sage
>>>>>
>>>>>>
>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>> 697]
>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>> [0x5561cdcd7bc7]
>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>> [0x5561ce292e7d]
>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to interpret this.
>>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>> Stefan
>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>
>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>
>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>
>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>
>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>
>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>
>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>> around the current assert:
>>>>>>>>>
>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>> +    } else {
>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>> +    }
>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>        recovery_info.soid,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> s
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>  5:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>> Greets,
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 22, 2018, 2:30 p.m. UTC | #17
On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> get_snaps returned the error as there are no snaps in the snapmapper. So
> it's OK to skip the assert.
> 
> My biggest problem is now that i'm not able to run a scrub on all PGs to
> fix all those errors because in around 1-3% the osd seem to deadlock and
> needs to get killed to proceed.

I believe this will fix it:

https://github.com/liewegas/ceph/commit/wip-snapmapper

Cherry-pick that and let me know?

Thanks!
sage


> 
> The log output is always:
>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> log [WRN] : slow request 141.951820 seconds old, received at
>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> 00000157:head v 927921'526559757) currently commit_sent
>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> log [WRN] : slow request 141.643853 seconds old, received at
>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> 00000147d:head v 927921'29105383) currently commit_sent
>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> log [WRN] : slow request 141.313220 seconds old, received at
>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> 000000aff:head v 927921'164552063) currently commit_sent
>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> 
>  ceph version 12.2.2-94-g92923ef
> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>  1: (()+0xa4423c) [0x561d876ab23c]
>  2: (()+0xf890) [0x7f100b974890]
>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> saction> >&, Context*)+0x254) [0x561d8746ac54]
>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> [0x561d87138bc7]
>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> const&)+0x57) [0x561d873b0db7]
>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> [0x561d876f42bd]
>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>  15: (()+0x8064) [0x7f100b96d064]
>  16: (clone()+0x6d) [0x7f100aa6162d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> --- logging levels ---
>    0/ 5 none
>    0/ 0 lockdep
>    0/ 0 context
>    0/ 0 crush
>    0/ 0 mds
>    0/ 0 mds_balancer
>    0/ 0 mds_locker
>    0/ 0 mds_log
>    0/ 0 mds_log_expire
>    0/ 0 mds_migrator
>    0/ 0 buffer
>    0/ 0 timer
>    0/ 0 filer
>    0/ 1 striper
>    0/ 0 objecter
>    0/ 0 rados
>    0/ 0 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 0 journaler
>    0/ 0 objectcacher
>    0/ 0 client
>    0/ 0 osd
>    0/ 0 optracker
>    0/ 0 objclass
>    0/ 0 filestore
>    0/ 0 journal
>    0/ 0 ms
>    0/ 0 mon
>    0/ 0 monc
>    0/ 0 paxos
>    0/ 0 tp
>    0/ 0 auth
>    1/ 5 crypto
>    0/ 0 finisher
>    1/ 1 reserver
>    0/ 0 heartbeatmap
>    0/ 0 perfcounter
>    0/ 0 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    0/ 0 asok
>    0/ 0 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    4/ 5 memdb
>    1/ 5 kinetic
>    1/ 5 fuse
>    1/ 5 mgr
>    1/ 5 mgrc
>    1/ 5 dpdk
>    1/ 5 eventtrace
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.59.log
> --- end dump of recent events ---
> 
> 
> Greets,
> Stefan
> 
> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> > On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> thanks for your reply. I'll do this. What i still don't understand is
> >> the following and i hope you might have an idea:
> >> 1.) All the snaps mentioned here are fresh - i created them today
> >> running luminous? How can they be missing in snapmapper again?
> > 
> > I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> > that when you hit the crash we have some context as to what is going on.
> > 
> > Did you track down which part of get_snaps() is returning the error?
> > 
> >> 2.) How did this happen? The cluster was jewel and was updated to
> >> luminous when all this happened.
> > 
> > Did this start right after the upgrade?
> > 
> > I started a PR that relaxes some of these assertions so that they clean up 
> > instead (unless the debug option is enabled, for our regression testing), 
> > but I'm still unclear about what the inconsistency is... any loggin you 
> > can provide would help!
> > 
> > Thanks-
> > sage
> > 
> > 
> > 
> >>
> >> Greets,
> >> Stefan
> >>
> >>
> >> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>> I'm guessing it is coming from
> >>>
> >>>   object_snaps out;
> >>>   int r = get_snaps(oid, &out);
> >>>   if (r < 0)
> >>>     return r;
> >>>
> >>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>> confirm that, and possibly also add debug lines to the various return 
> >>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>
> >>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>> hard to say without seeing what the inconsistency there is and whether we 
> >>> can tolerate it...
> >>>
> >>> sage
> >>>
> >>>
> >>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>
> >>>> HI Sage,
> >>>>
> >>>> any ideas for this one sawing while doing a deep scrub:
> >>>>
> >>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>
> >>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>
> >>>>  ceph version 12.2.2-94-g92923ef
> >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>> const*)+0x102) [0x55c0100d0372]
> >>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>> std::allocator<pg_log_entry_t> > const&,
> >>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>> std::allocator<pg_log_entry_t> > const&,
> >>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>> [0x55c00fdf6642]
> >>>>  6:
> >>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>> [0x55c00fdfc684]
> >>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>> [0x55c00fd22030]
> >>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>> [0x55c00fb1abc7]
> >>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>> [0x55c0100d5ffd]
> >>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>> needed to interpret this.
> >>>>
> >>>>
> >>>> Stefan
> >>>>
> >>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi Sage,
> >>>>>>
> >>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>
> >>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>   const hobject_t &oid,
> >>>>>>>>   object_snaps *out)
> >>>>>>>> {
> >>>>>>>>   assert(check(oid));
> >>>>>>>>   set<string> keys;
> >>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>   if (r < 0)
> >>>>>>>>     return r;
> >>>>>>>>   if (got.empty())
> >>>>>>>>     return -ENOENT;
> >>>>>>>>   if (out) {
> >>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>     ::decode(*out, bp);
> >>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>   } else {
> >>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>   }
> >>>>>>>>   return 0;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> is it save to comment that assert?
> >>>>>>>
> >>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>> can't repair.
> >>>>>>
> >>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>
> >>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>> SnapMapper::get_snaps(const h
> >>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>> 2018-01-18 13:00:54.840396
> >>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>
> >>>>> I think if you switch that assert to a warning it will repair...
> >>>>>
> >>>>> sage
> >>>>>
> >>>>>>
> >>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>> 697]
> >>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>> [0x5561cdcd7bc7]
> >>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>> [0x5561ce292e7d]
> >>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>> needed to interpret this.
> >>>>>>
> >>>>>>
> >>>>>> Greets,
> >>>>>> Stefan
> >>>>>>
> >>>>>>> sage
> >>>>>>>
> >>>>>>>
> >>>>>>>> Stefan
> >>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>
> >>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>
> >>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>
> >>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>
> >>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>
> >>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>
> >>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>
> >>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>> around the current assert:
> >>>>>>>>>
> >>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>> +    } else {
> >>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>> +    }
> >>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>        recovery_info.soid,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> s
> >>>>>>>>>>>>
> >>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>  5:
> >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Greets,
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> sage
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 22, 2018, 6:49 p.m. UTC | #18
Hi Sage,

thanks! That one fixed it! Now i'm just waiting if the log ever stops
telling me:
2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
on pg 3.33e oid
3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
mapper: , oi: b0d7c...repaired

Currently it seems it does an endless repair and never ends.

Greets,
Stefan
Am 22.01.2018 um 15:30 schrieb Sage Weil:
> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> get_snaps returned the error as there are no snaps in the snapmapper. So
>> it's OK to skip the assert.
>>
>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>> fix all those errors because in around 1-3% the osd seem to deadlock and
>> needs to get killed to proceed.
> 
> I believe this will fix it:
> 
> https://github.com/liewegas/ceph/commit/wip-snapmapper
> 
> Cherry-pick that and let me know?
> 
> Thanks!
> sage
> 
> 
>>
>> The log output is always:
>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>> log [WRN] : slow request 141.951820 seconds old, received at
>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>> 00000157:head v 927921'526559757) currently commit_sent
>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>> log [WRN] : slow request 141.643853 seconds old, received at
>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>> 00000147d:head v 927921'29105383) currently commit_sent
>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>> log [WRN] : slow request 141.313220 seconds old, received at
>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>> 000000aff:head v 927921'164552063) currently commit_sent
>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>> (Aborted) **
>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>
>>  ceph version 12.2.2-94-g92923ef
>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>  1: (()+0xa4423c) [0x561d876ab23c]
>>  2: (()+0xf890) [0x7f100b974890]
>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>> [0x561d87138bc7]
>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x561d873b0db7]
>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>> [0x561d876f42bd]
>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>  15: (()+0x8064) [0x7f100b96d064]
>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 0 lockdep
>>    0/ 0 context
>>    0/ 0 crush
>>    0/ 0 mds
>>    0/ 0 mds_balancer
>>    0/ 0 mds_locker
>>    0/ 0 mds_log
>>    0/ 0 mds_log_expire
>>    0/ 0 mds_migrator
>>    0/ 0 buffer
>>    0/ 0 timer
>>    0/ 0 filer
>>    0/ 1 striper
>>    0/ 0 objecter
>>    0/ 0 rados
>>    0/ 0 rbd
>>    0/ 5 rbd_mirror
>>    0/ 5 rbd_replay
>>    0/ 0 journaler
>>    0/ 0 objectcacher
>>    0/ 0 client
>>    0/ 0 osd
>>    0/ 0 optracker
>>    0/ 0 objclass
>>    0/ 0 filestore
>>    0/ 0 journal
>>    0/ 0 ms
>>    0/ 0 mon
>>    0/ 0 monc
>>    0/ 0 paxos
>>    0/ 0 tp
>>    0/ 0 auth
>>    1/ 5 crypto
>>    0/ 0 finisher
>>    1/ 1 reserver
>>    0/ 0 heartbeatmap
>>    0/ 0 perfcounter
>>    0/ 0 rgw
>>    1/10 civetweb
>>    1/ 5 javaclient
>>    0/ 0 asok
>>    0/ 0 throttle
>>    0/ 0 refs
>>    1/ 5 xio
>>    1/ 5 compressor
>>    1/ 5 bluestore
>>    1/ 5 bluefs
>>    1/ 3 bdev
>>    1/ 5 kstore
>>    4/ 5 rocksdb
>>    4/ 5 leveldb
>>    4/ 5 memdb
>>    1/ 5 kinetic
>>    1/ 5 fuse
>>    1/ 5 mgr
>>    1/ 5 mgrc
>>    1/ 5 dpdk
>>    1/ 5 eventtrace
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file /var/log/ceph/ceph-osd.59.log
>> --- end dump of recent events ---
>>
>>
>> Greets,
>> Stefan
>>
>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>> the following and i hope you might have an idea:
>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>> running luminous? How can they be missing in snapmapper again?
>>>
>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>> that when you hit the crash we have some context as to what is going on.
>>>
>>> Did you track down which part of get_snaps() is returning the error?
>>>
>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>> luminous when all this happened.
>>>
>>> Did this start right after the upgrade?
>>>
>>> I started a PR that relaxes some of these assertions so that they clean up 
>>> instead (unless the debug option is enabled, for our regression testing), 
>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>> can provide would help!
>>>
>>> Thanks-
>>> sage
>>>
>>>
>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>>
>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>> I'm guessing it is coming from
>>>>>
>>>>>   object_snaps out;
>>>>>   int r = get_snaps(oid, &out);
>>>>>   if (r < 0)
>>>>>     return r;
>>>>>
>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>
>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>> can tolerate it...
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>
>>>>>> HI Sage,
>>>>>>
>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>
>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>
>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>
>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>> [0x55c00fdf6642]
>>>>>>  6:
>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>> [0x55c00fdfc684]
>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>> [0x55c00fd22030]
>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>> [0x55c00fb1abc7]
>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>> [0x55c0100d5ffd]
>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to interpret this.
>>>>>>
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>
>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>   object_snaps *out)
>>>>>>>>>> {
>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>   set<string> keys;
>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>   if (r < 0)
>>>>>>>>>>     return r;
>>>>>>>>>>   if (got.empty())
>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>   if (out) {
>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>   } else {
>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>   }
>>>>>>>>>>   return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>
>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>> can't repair.
>>>>>>>>
>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>
>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>
>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>> 697]
>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x5561ce292e7d]
>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>
>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>
>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>
>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>> around the current assert:
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>> +    } else {
>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>> +    }
>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 22, 2018, 7:01 p.m. UTC | #19
On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> thanks! That one fixed it! Now i'm just waiting if the log ever stops
> telling me:
> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
> on pg 3.33e oid
> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
> mapper: , oi: b0d7c...repaired
> 
> Currently it seems it does an endless repair and never ends.

On many objects or the same object?

sage

> 
> Greets,
> Stefan
> Am 22.01.2018 um 15:30 schrieb Sage Weil:
> > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> get_snaps returned the error as there are no snaps in the snapmapper. So
> >> it's OK to skip the assert.
> >>
> >> My biggest problem is now that i'm not able to run a scrub on all PGs to
> >> fix all those errors because in around 1-3% the osd seem to deadlock and
> >> needs to get killed to proceed.
> > 
> > I believe this will fix it:
> > 
> > https://github.com/liewegas/ceph/commit/wip-snapmapper
> > 
> > Cherry-pick that and let me know?
> > 
> > Thanks!
> > sage
> > 
> > 
> >>
> >> The log output is always:
> >>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> >> log [WRN] : slow request 141.951820 seconds old, received at
> >>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> >> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> >> 00000157:head v 927921'526559757) currently commit_sent
> >>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> >> log [WRN] : slow request 141.643853 seconds old, received at
> >>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> >> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> >> 00000147d:head v 927921'29105383) currently commit_sent
> >>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> >> log [WRN] : slow request 141.313220 seconds old, received at
> >>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> >> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> >> 000000aff:head v 927921'164552063) currently commit_sent
> >>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> >> (Aborted) **
> >>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> >>
> >>  ceph version 12.2.2-94-g92923ef
> >> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>  1: (()+0xa4423c) [0x561d876ab23c]
> >>  2: (()+0xf890) [0x7f100b974890]
> >>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
> >>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> >> saction> >&, Context*)+0x254) [0x561d8746ac54]
> >>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> >> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
> >>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
> >>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
> >>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
> >>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
> >>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >> [0x561d87138bc7]
> >>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >> const&)+0x57) [0x561d873b0db7]
> >>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
> >>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >> [0x561d876f42bd]
> >>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
> >>  15: (()+0x8064) [0x7f100b96d064]
> >>  16: (clone()+0x6d) [0x7f100aa6162d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >> needed to interpret this.
> >>
> >> --- logging levels ---
> >>    0/ 5 none
> >>    0/ 0 lockdep
> >>    0/ 0 context
> >>    0/ 0 crush
> >>    0/ 0 mds
> >>    0/ 0 mds_balancer
> >>    0/ 0 mds_locker
> >>    0/ 0 mds_log
> >>    0/ 0 mds_log_expire
> >>    0/ 0 mds_migrator
> >>    0/ 0 buffer
> >>    0/ 0 timer
> >>    0/ 0 filer
> >>    0/ 1 striper
> >>    0/ 0 objecter
> >>    0/ 0 rados
> >>    0/ 0 rbd
> >>    0/ 5 rbd_mirror
> >>    0/ 5 rbd_replay
> >>    0/ 0 journaler
> >>    0/ 0 objectcacher
> >>    0/ 0 client
> >>    0/ 0 osd
> >>    0/ 0 optracker
> >>    0/ 0 objclass
> >>    0/ 0 filestore
> >>    0/ 0 journal
> >>    0/ 0 ms
> >>    0/ 0 mon
> >>    0/ 0 monc
> >>    0/ 0 paxos
> >>    0/ 0 tp
> >>    0/ 0 auth
> >>    1/ 5 crypto
> >>    0/ 0 finisher
> >>    1/ 1 reserver
> >>    0/ 0 heartbeatmap
> >>    0/ 0 perfcounter
> >>    0/ 0 rgw
> >>    1/10 civetweb
> >>    1/ 5 javaclient
> >>    0/ 0 asok
> >>    0/ 0 throttle
> >>    0/ 0 refs
> >>    1/ 5 xio
> >>    1/ 5 compressor
> >>    1/ 5 bluestore
> >>    1/ 5 bluefs
> >>    1/ 3 bdev
> >>    1/ 5 kstore
> >>    4/ 5 rocksdb
> >>    4/ 5 leveldb
> >>    4/ 5 memdb
> >>    1/ 5 kinetic
> >>    1/ 5 fuse
> >>    1/ 5 mgr
> >>    1/ 5 mgrc
> >>    1/ 5 dpdk
> >>    1/ 5 eventtrace
> >>   -2/-2 (syslog threshold)
> >>   -1/-1 (stderr threshold)
> >>   max_recent     10000
> >>   max_new         1000
> >>   log_file /var/log/ceph/ceph-osd.59.log
> >> --- end dump of recent events ---
> >>
> >>
> >> Greets,
> >> Stefan
> >>
> >> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> >>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> thanks for your reply. I'll do this. What i still don't understand is
> >>>> the following and i hope you might have an idea:
> >>>> 1.) All the snaps mentioned here are fresh - i created them today
> >>>> running luminous? How can they be missing in snapmapper again?
> >>>
> >>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> >>> that when you hit the crash we have some context as to what is going on.
> >>>
> >>> Did you track down which part of get_snaps() is returning the error?
> >>>
> >>>> 2.) How did this happen? The cluster was jewel and was updated to
> >>>> luminous when all this happened.
> >>>
> >>> Did this start right after the upgrade?
> >>>
> >>> I started a PR that relaxes some of these assertions so that they clean up 
> >>> instead (unless the debug option is enabled, for our regression testing), 
> >>> but I'm still unclear about what the inconsistency is... any loggin you 
> >>> can provide would help!
> >>>
> >>> Thanks-
> >>> sage
> >>>
> >>>
> >>>
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>>
> >>>>
> >>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>>>> I'm guessing it is coming from
> >>>>>
> >>>>>   object_snaps out;
> >>>>>   int r = get_snaps(oid, &out);
> >>>>>   if (r < 0)
> >>>>>     return r;
> >>>>>
> >>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>>>> confirm that, and possibly also add debug lines to the various return 
> >>>>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>>>
> >>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>>>> hard to say without seeing what the inconsistency there is and whether we 
> >>>>> can tolerate it...
> >>>>>
> >>>>> sage
> >>>>>
> >>>>>
> >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>
> >>>>>> HI Sage,
> >>>>>>
> >>>>>> any ideas for this one sawing while doing a deep scrub:
> >>>>>>
> >>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>>>
> >>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>>>
> >>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>> const*)+0x102) [0x55c0100d0372]
> >>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>> [0x55c00fdf6642]
> >>>>>>  6:
> >>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>> [0x55c00fdfc684]
> >>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>> [0x55c00fd22030]
> >>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>> [0x55c00fb1abc7]
> >>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>> [0x55c0100d5ffd]
> >>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>> needed to interpret this.
> >>>>>>
> >>>>>>
> >>>>>> Stefan
> >>>>>>
> >>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Hi Sage,
> >>>>>>>>
> >>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>>>
> >>>>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>>>   const hobject_t &oid,
> >>>>>>>>>>   object_snaps *out)
> >>>>>>>>>> {
> >>>>>>>>>>   assert(check(oid));
> >>>>>>>>>>   set<string> keys;
> >>>>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>>>   if (r < 0)
> >>>>>>>>>>     return r;
> >>>>>>>>>>   if (got.empty())
> >>>>>>>>>>     return -ENOENT;
> >>>>>>>>>>   if (out) {
> >>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>>>     ::decode(*out, bp);
> >>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>>>   } else {
> >>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>>>   }
> >>>>>>>>>>   return 0;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> is it save to comment that assert?
> >>>>>>>>>
> >>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>>>> can't repair.
> >>>>>>>>
> >>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>>>
> >>>>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>>>> SnapMapper::get_snaps(const h
> >>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>>>> 2018-01-18 13:00:54.840396
> >>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>>>
> >>>>>>> I think if you switch that assert to a warning it will repair...
> >>>>>>>
> >>>>>>> sage
> >>>>>>>
> >>>>>>>>
> >>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>>>> 697]
> >>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>> [0x5561cdcd7bc7]
> >>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>> [0x5561ce292e7d]
> >>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>> needed to interpret this.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Greets,
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>>> sage
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Stefan
> >>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>>>
> >>>>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>>>
> >>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>>>> around the current assert:
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>>>> +    } else {
> >>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>> +    }
> >>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>>>        recovery_info.soid,
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> s
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>>>  5:
> >>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 22, 2018, 7:15 p.m. UTC | #20
Am 22.01.2018 um 20:01 schrieb Sage Weil:
> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>> telling me:
>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>> on pg 3.33e oid
>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>> mapper: , oi: b0d7c...repaired
>>
>> Currently it seems it does an endless repair and never ends.
> 
> On many objects or the same object?

I did a manual scrub on all pgs but now it has started the automatic
scrub and find those errors again.

I haven't checked the object names yet. I'll wait 24 hours and see
what's happening.

Thanks again!

Stefan

> 
> sage
> 
>>
>> Greets,
>> Stefan
>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>> it's OK to skip the assert.
>>>>
>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>> needs to get killed to proceed.
>>>
>>> I believe this will fix it:
>>>
>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>
>>> Cherry-pick that and let me know?
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>
>>>> The log output is always:
>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>> (Aborted) **
>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x561d87138bc7]
>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x561d873b0db7]
>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x561d876f42bd]
>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> --- logging levels ---
>>>>    0/ 5 none
>>>>    0/ 0 lockdep
>>>>    0/ 0 context
>>>>    0/ 0 crush
>>>>    0/ 0 mds
>>>>    0/ 0 mds_balancer
>>>>    0/ 0 mds_locker
>>>>    0/ 0 mds_log
>>>>    0/ 0 mds_log_expire
>>>>    0/ 0 mds_migrator
>>>>    0/ 0 buffer
>>>>    0/ 0 timer
>>>>    0/ 0 filer
>>>>    0/ 1 striper
>>>>    0/ 0 objecter
>>>>    0/ 0 rados
>>>>    0/ 0 rbd
>>>>    0/ 5 rbd_mirror
>>>>    0/ 5 rbd_replay
>>>>    0/ 0 journaler
>>>>    0/ 0 objectcacher
>>>>    0/ 0 client
>>>>    0/ 0 osd
>>>>    0/ 0 optracker
>>>>    0/ 0 objclass
>>>>    0/ 0 filestore
>>>>    0/ 0 journal
>>>>    0/ 0 ms
>>>>    0/ 0 mon
>>>>    0/ 0 monc
>>>>    0/ 0 paxos
>>>>    0/ 0 tp
>>>>    0/ 0 auth
>>>>    1/ 5 crypto
>>>>    0/ 0 finisher
>>>>    1/ 1 reserver
>>>>    0/ 0 heartbeatmap
>>>>    0/ 0 perfcounter
>>>>    0/ 0 rgw
>>>>    1/10 civetweb
>>>>    1/ 5 javaclient
>>>>    0/ 0 asok
>>>>    0/ 0 throttle
>>>>    0/ 0 refs
>>>>    1/ 5 xio
>>>>    1/ 5 compressor
>>>>    1/ 5 bluestore
>>>>    1/ 5 bluefs
>>>>    1/ 3 bdev
>>>>    1/ 5 kstore
>>>>    4/ 5 rocksdb
>>>>    4/ 5 leveldb
>>>>    4/ 5 memdb
>>>>    1/ 5 kinetic
>>>>    1/ 5 fuse
>>>>    1/ 5 mgr
>>>>    1/ 5 mgrc
>>>>    1/ 5 dpdk
>>>>    1/ 5 eventtrace
>>>>   -2/-2 (syslog threshold)
>>>>   -1/-1 (stderr threshold)
>>>>   max_recent     10000
>>>>   max_new         1000
>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>> --- end dump of recent events ---
>>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>> the following and i hope you might have an idea:
>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>
>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>
>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>
>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>> luminous when all this happened.
>>>>>
>>>>> Did this start right after the upgrade?
>>>>>
>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>> can provide would help!
>>>>>
>>>>> Thanks-
>>>>> sage
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>>
>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>> I'm guessing it is coming from
>>>>>>>
>>>>>>>   object_snaps out;
>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>   if (r < 0)
>>>>>>>     return r;
>>>>>>>
>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>
>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>> can tolerate it...
>>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>
>>>>>>>> HI Sage,
>>>>>>>>
>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>
>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>
>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x55c00fdf6642]
>>>>>>>>  6:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x55c00fdfc684]
>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x55c00fd22030]
>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>
>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>> {
>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>     return r;
>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>   } else {
>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>   }
>>>>>>>>>>>>   return 0;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>
>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>> can't repair.
>>>>>>>>>>
>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>
>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>
>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>> 697]
>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 23, 2018, 8:48 p.m. UTC | #21
Am 22.01.2018 um 20:01 schrieb Sage Weil:
> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>> telling me:
>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>> on pg 3.33e oid
>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>> mapper: , oi: b0d7c...repaired
>>
>> Currently it seems it does an endless repair and never ends.
> 
> On many objects or the same object?

OK no it still does endless repair but the reason is it also repairs
"newly" created snapshots.

Example:
2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error
on pg 3.c70 oid
3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps
missing in mapper, should be: b0d7c...repaired

rbd image 'vm-251-disk-1':
        size 170 GB in 43520 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.52ab946b8b4567
        format: 2
        features: layering
        flags:
SNAPID NAME                                   SIZE TIMESTAMP
726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018

So this is a fresh snapshot which needs a repair?

Greets,
Stefan

> 
> sage
> 
>>
>> Greets,
>> Stefan
>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>> it's OK to skip the assert.
>>>>
>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>> needs to get killed to proceed.
>>>
>>> I believe this will fix it:
>>>
>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>
>>> Cherry-pick that and let me know?
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>
>>>> The log output is always:
>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>> (Aborted) **
>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x561d87138bc7]
>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x561d873b0db7]
>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x561d876f42bd]
>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> --- logging levels ---
>>>>    0/ 5 none
>>>>    0/ 0 lockdep
>>>>    0/ 0 context
>>>>    0/ 0 crush
>>>>    0/ 0 mds
>>>>    0/ 0 mds_balancer
>>>>    0/ 0 mds_locker
>>>>    0/ 0 mds_log
>>>>    0/ 0 mds_log_expire
>>>>    0/ 0 mds_migrator
>>>>    0/ 0 buffer
>>>>    0/ 0 timer
>>>>    0/ 0 filer
>>>>    0/ 1 striper
>>>>    0/ 0 objecter
>>>>    0/ 0 rados
>>>>    0/ 0 rbd
>>>>    0/ 5 rbd_mirror
>>>>    0/ 5 rbd_replay
>>>>    0/ 0 journaler
>>>>    0/ 0 objectcacher
>>>>    0/ 0 client
>>>>    0/ 0 osd
>>>>    0/ 0 optracker
>>>>    0/ 0 objclass
>>>>    0/ 0 filestore
>>>>    0/ 0 journal
>>>>    0/ 0 ms
>>>>    0/ 0 mon
>>>>    0/ 0 monc
>>>>    0/ 0 paxos
>>>>    0/ 0 tp
>>>>    0/ 0 auth
>>>>    1/ 5 crypto
>>>>    0/ 0 finisher
>>>>    1/ 1 reserver
>>>>    0/ 0 heartbeatmap
>>>>    0/ 0 perfcounter
>>>>    0/ 0 rgw
>>>>    1/10 civetweb
>>>>    1/ 5 javaclient
>>>>    0/ 0 asok
>>>>    0/ 0 throttle
>>>>    0/ 0 refs
>>>>    1/ 5 xio
>>>>    1/ 5 compressor
>>>>    1/ 5 bluestore
>>>>    1/ 5 bluefs
>>>>    1/ 3 bdev
>>>>    1/ 5 kstore
>>>>    4/ 5 rocksdb
>>>>    4/ 5 leveldb
>>>>    4/ 5 memdb
>>>>    1/ 5 kinetic
>>>>    1/ 5 fuse
>>>>    1/ 5 mgr
>>>>    1/ 5 mgrc
>>>>    1/ 5 dpdk
>>>>    1/ 5 eventtrace
>>>>   -2/-2 (syslog threshold)
>>>>   -1/-1 (stderr threshold)
>>>>   max_recent     10000
>>>>   max_new         1000
>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>> --- end dump of recent events ---
>>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>> the following and i hope you might have an idea:
>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>
>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>
>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>
>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>> luminous when all this happened.
>>>>>
>>>>> Did this start right after the upgrade?
>>>>>
>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>> can provide would help!
>>>>>
>>>>> Thanks-
>>>>> sage
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>>
>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>> I'm guessing it is coming from
>>>>>>>
>>>>>>>   object_snaps out;
>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>   if (r < 0)
>>>>>>>     return r;
>>>>>>>
>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>
>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>> can tolerate it...
>>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>
>>>>>>>> HI Sage,
>>>>>>>>
>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>
>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>
>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x55c00fdf6642]
>>>>>>>>  6:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x55c00fdfc684]
>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x55c00fd22030]
>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>
>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>> {
>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>     return r;
>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>   } else {
>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>   }
>>>>>>>>>>>>   return 0;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>
>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>> can't repair.
>>>>>>>>>>
>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>
>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>
>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>> 697]
>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 24, 2018, 12:07 a.m. UTC | #22
On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote:
> 
> Am 22.01.2018 um 20:01 schrieb Sage Weil:
> > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> thanks! That one fixed it! Now i'm just waiting if the log ever stops
> >> telling me:
> >> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
> >> on pg 3.33e oid
> >> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
> >> mapper: , oi: b0d7c...repaired
> >>
> >> Currently it seems it does an endless repair and never ends.
> > 
> > On many objects or the same object?
> 
> OK no it still does endless repair but the reason is it also repairs
> "newly" created snapshots.
> 
> Example:
> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error
> on pg 3.c70 oid
> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps
> missing in mapper, should be: b0d7c...repaired
> 
> rbd image 'vm-251-disk-1':
>         size 170 GB in 43520 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.52ab946b8b4567
>         format: 2
>         features: layering
>         flags:
> SNAPID NAME                                   SIZE TIMESTAMP
> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018
> 
> So this is a fresh snapshot which needs a repair?

Aha!  That's a helpful clue.  Is there any chance you can crank up teh 
osd debug log on one OSD can catch one of these cases?  (We want to 
capture the clone being created, and also the scrub on the pg that follows 
and sees the mismatch).

Thanks!
sage


> Greets,
> Stefan
> 
> > 
> > sage
> > 
> >>
> >> Greets,
> >> Stefan
> >> Am 22.01.2018 um 15:30 schrieb Sage Weil:
> >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> get_snaps returned the error as there are no snaps in the snapmapper. So
> >>>> it's OK to skip the assert.
> >>>>
> >>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
> >>>> fix all those errors because in around 1-3% the osd seem to deadlock and
> >>>> needs to get killed to proceed.
> >>>
> >>> I believe this will fix it:
> >>>
> >>> https://github.com/liewegas/ceph/commit/wip-snapmapper
> >>>
> >>> Cherry-pick that and let me know?
> >>>
> >>> Thanks!
> >>> sage
> >>>
> >>>
> >>>>
> >>>> The log output is always:
> >>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> >>>> log [WRN] : slow request 141.951820 seconds old, received at
> >>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> >>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> >>>> 00000157:head v 927921'526559757) currently commit_sent
> >>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> >>>> log [WRN] : slow request 141.643853 seconds old, received at
> >>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> >>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> >>>> 00000147d:head v 927921'29105383) currently commit_sent
> >>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> >>>> log [WRN] : slow request 141.313220 seconds old, received at
> >>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> >>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> >>>> 000000aff:head v 927921'164552063) currently commit_sent
> >>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> >>>> (Aborted) **
> >>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> >>>>
> >>>>  ceph version 12.2.2-94-g92923ef
> >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>  1: (()+0xa4423c) [0x561d876ab23c]
> >>>>  2: (()+0xf890) [0x7f100b974890]
> >>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
> >>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> >>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
> >>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> >>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
> >>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
> >>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
> >>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
> >>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
> >>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>> [0x561d87138bc7]
> >>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>> const&)+0x57) [0x561d873b0db7]
> >>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
> >>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>> [0x561d876f42bd]
> >>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
> >>>>  15: (()+0x8064) [0x7f100b96d064]
> >>>>  16: (clone()+0x6d) [0x7f100aa6162d]
> >>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>> needed to interpret this.
> >>>>
> >>>> --- logging levels ---
> >>>>    0/ 5 none
> >>>>    0/ 0 lockdep
> >>>>    0/ 0 context
> >>>>    0/ 0 crush
> >>>>    0/ 0 mds
> >>>>    0/ 0 mds_balancer
> >>>>    0/ 0 mds_locker
> >>>>    0/ 0 mds_log
> >>>>    0/ 0 mds_log_expire
> >>>>    0/ 0 mds_migrator
> >>>>    0/ 0 buffer
> >>>>    0/ 0 timer
> >>>>    0/ 0 filer
> >>>>    0/ 1 striper
> >>>>    0/ 0 objecter
> >>>>    0/ 0 rados
> >>>>    0/ 0 rbd
> >>>>    0/ 5 rbd_mirror
> >>>>    0/ 5 rbd_replay
> >>>>    0/ 0 journaler
> >>>>    0/ 0 objectcacher
> >>>>    0/ 0 client
> >>>>    0/ 0 osd
> >>>>    0/ 0 optracker
> >>>>    0/ 0 objclass
> >>>>    0/ 0 filestore
> >>>>    0/ 0 journal
> >>>>    0/ 0 ms
> >>>>    0/ 0 mon
> >>>>    0/ 0 monc
> >>>>    0/ 0 paxos
> >>>>    0/ 0 tp
> >>>>    0/ 0 auth
> >>>>    1/ 5 crypto
> >>>>    0/ 0 finisher
> >>>>    1/ 1 reserver
> >>>>    0/ 0 heartbeatmap
> >>>>    0/ 0 perfcounter
> >>>>    0/ 0 rgw
> >>>>    1/10 civetweb
> >>>>    1/ 5 javaclient
> >>>>    0/ 0 asok
> >>>>    0/ 0 throttle
> >>>>    0/ 0 refs
> >>>>    1/ 5 xio
> >>>>    1/ 5 compressor
> >>>>    1/ 5 bluestore
> >>>>    1/ 5 bluefs
> >>>>    1/ 3 bdev
> >>>>    1/ 5 kstore
> >>>>    4/ 5 rocksdb
> >>>>    4/ 5 leveldb
> >>>>    4/ 5 memdb
> >>>>    1/ 5 kinetic
> >>>>    1/ 5 fuse
> >>>>    1/ 5 mgr
> >>>>    1/ 5 mgrc
> >>>>    1/ 5 dpdk
> >>>>    1/ 5 eventtrace
> >>>>   -2/-2 (syslog threshold)
> >>>>   -1/-1 (stderr threshold)
> >>>>   max_recent     10000
> >>>>   max_new         1000
> >>>>   log_file /var/log/ceph/ceph-osd.59.log
> >>>> --- end dump of recent events ---
> >>>>
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>>
> >>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi Sage,
> >>>>>>
> >>>>>> thanks for your reply. I'll do this. What i still don't understand is
> >>>>>> the following and i hope you might have an idea:
> >>>>>> 1.) All the snaps mentioned here are fresh - i created them today
> >>>>>> running luminous? How can they be missing in snapmapper again?
> >>>>>
> >>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> >>>>> that when you hit the crash we have some context as to what is going on.
> >>>>>
> >>>>> Did you track down which part of get_snaps() is returning the error?
> >>>>>
> >>>>>> 2.) How did this happen? The cluster was jewel and was updated to
> >>>>>> luminous when all this happened.
> >>>>>
> >>>>> Did this start right after the upgrade?
> >>>>>
> >>>>> I started a PR that relaxes some of these assertions so that they clean up 
> >>>>> instead (unless the debug option is enabled, for our regression testing), 
> >>>>> but I'm still unclear about what the inconsistency is... any loggin you 
> >>>>> can provide would help!
> >>>>>
> >>>>> Thanks-
> >>>>> sage
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Greets,
> >>>>>> Stefan
> >>>>>>
> >>>>>>
> >>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>>>>>> I'm guessing it is coming from
> >>>>>>>
> >>>>>>>   object_snaps out;
> >>>>>>>   int r = get_snaps(oid, &out);
> >>>>>>>   if (r < 0)
> >>>>>>>     return r;
> >>>>>>>
> >>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>>>>>> confirm that, and possibly also add debug lines to the various return 
> >>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>>>>>
> >>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>>>>>> hard to say without seeing what the inconsistency there is and whether we 
> >>>>>>> can tolerate it...
> >>>>>>>
> >>>>>>> sage
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>
> >>>>>>>> HI Sage,
> >>>>>>>>
> >>>>>>>> any ideas for this one sawing while doing a deep scrub:
> >>>>>>>>
> >>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>>>>>
> >>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>>>>>
> >>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>> const*)+0x102) [0x55c0100d0372]
> >>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>> [0x55c00fdf6642]
> >>>>>>>>  6:
> >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>> [0x55c00fdfc684]
> >>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>> [0x55c00fd22030]
> >>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>> [0x55c00fb1abc7]
> >>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>> [0x55c0100d5ffd]
> >>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>> needed to interpret this.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>> Hi Sage,
> >>>>>>>>>>
> >>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>>>>>
> >>>>>>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>>>>>   const hobject_t &oid,
> >>>>>>>>>>>>   object_snaps *out)
> >>>>>>>>>>>> {
> >>>>>>>>>>>>   assert(check(oid));
> >>>>>>>>>>>>   set<string> keys;
> >>>>>>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>>     return r;
> >>>>>>>>>>>>   if (got.empty())
> >>>>>>>>>>>>     return -ENOENT;
> >>>>>>>>>>>>   if (out) {
> >>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>>>>>     ::decode(*out, bp);
> >>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>>>>>   } else {
> >>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>>>>>   }
> >>>>>>>>>>>>   return 0;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> is it save to comment that assert?
> >>>>>>>>>>>
> >>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>>>>>> can't repair.
> >>>>>>>>>>
> >>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>>>>>
> >>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>>>>>> SnapMapper::get_snaps(const h
> >>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>>>>>> 2018-01-18 13:00:54.840396
> >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>>>>>
> >>>>>>>>> I think if you switch that assert to a warning it will repair...
> >>>>>>>>>
> >>>>>>>>> sage
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>>>>>> 697]
> >>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>> [0x5561cdcd7bc7]
> >>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>> [0x5561ce292e7d]
> >>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>> needed to interpret this.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Greets,
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>>> sage
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>>>>>> around the current assert:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>>>>>> +    } else {
> >>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>> +    }
> >>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>>>>>        recovery_info.soid,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> s
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>>>>>  5:
> >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 24, 2018, 7:17 a.m. UTC | #23
Hi Sage,

Am 24.01.2018 um 01:07 schrieb Sage Weil:
> On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>
>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>> telling me:
>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>> on pg 3.33e oid
>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>> mapper: , oi: b0d7c...repaired
>>>>
>>>> Currently it seems it does an endless repair and never ends.
>>>
>>> On many objects or the same object?
>>
>> OK no it still does endless repair but the reason is it also repairs
>> "newly" created snapshots.
>>
>> Example:
>> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error
>> on pg 3.c70 oid
>> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps
>> missing in mapper, should be: b0d7c...repaired
>>
>> rbd image 'vm-251-disk-1':
>>         size 170 GB in 43520 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.52ab946b8b4567
>>         format: 2
>>         features: layering
>>         flags:
>> SNAPID NAME                                   SIZE TIMESTAMP
>> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018
>>
>> So this is a fresh snapshot which needs a repair?
> 
> Aha!  That's a helpful clue.  Is there any chance you can crank up teh 
> osd debug log on one OSD can catch one of these cases?  (We want to 
> capture the clone being created, and also the scrub on the pg that follows 
> and sees the mismatch).

You mean debug_osd? How high must it be? And do we really need all log
messages or might it be enough to change some of them to 0 in the source
code - so they always log?

Greets,
Stefan

> Thanks!
> sage
> 
> 
>> Greets,
>> Stefan
>>
>>>
>>> sage
>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>> it's OK to skip the assert.
>>>>>>
>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>> needs to get killed to proceed.
>>>>>
>>>>> I believe this will fix it:
>>>>>
>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>
>>>>> Cherry-pick that and let me know?
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>>
>>>>>>
>>>>>> The log output is always:
>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>> (Aborted) **
>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>
>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>> [0x561d87138bc7]
>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>> [0x561d876f42bd]
>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to interpret this.
>>>>>>
>>>>>> --- logging levels ---
>>>>>>    0/ 5 none
>>>>>>    0/ 0 lockdep
>>>>>>    0/ 0 context
>>>>>>    0/ 0 crush
>>>>>>    0/ 0 mds
>>>>>>    0/ 0 mds_balancer
>>>>>>    0/ 0 mds_locker
>>>>>>    0/ 0 mds_log
>>>>>>    0/ 0 mds_log_expire
>>>>>>    0/ 0 mds_migrator
>>>>>>    0/ 0 buffer
>>>>>>    0/ 0 timer
>>>>>>    0/ 0 filer
>>>>>>    0/ 1 striper
>>>>>>    0/ 0 objecter
>>>>>>    0/ 0 rados
>>>>>>    0/ 0 rbd
>>>>>>    0/ 5 rbd_mirror
>>>>>>    0/ 5 rbd_replay
>>>>>>    0/ 0 journaler
>>>>>>    0/ 0 objectcacher
>>>>>>    0/ 0 client
>>>>>>    0/ 0 osd
>>>>>>    0/ 0 optracker
>>>>>>    0/ 0 objclass
>>>>>>    0/ 0 filestore
>>>>>>    0/ 0 journal
>>>>>>    0/ 0 ms
>>>>>>    0/ 0 mon
>>>>>>    0/ 0 monc
>>>>>>    0/ 0 paxos
>>>>>>    0/ 0 tp
>>>>>>    0/ 0 auth
>>>>>>    1/ 5 crypto
>>>>>>    0/ 0 finisher
>>>>>>    1/ 1 reserver
>>>>>>    0/ 0 heartbeatmap
>>>>>>    0/ 0 perfcounter
>>>>>>    0/ 0 rgw
>>>>>>    1/10 civetweb
>>>>>>    1/ 5 javaclient
>>>>>>    0/ 0 asok
>>>>>>    0/ 0 throttle
>>>>>>    0/ 0 refs
>>>>>>    1/ 5 xio
>>>>>>    1/ 5 compressor
>>>>>>    1/ 5 bluestore
>>>>>>    1/ 5 bluefs
>>>>>>    1/ 3 bdev
>>>>>>    1/ 5 kstore
>>>>>>    4/ 5 rocksdb
>>>>>>    4/ 5 leveldb
>>>>>>    4/ 5 memdb
>>>>>>    1/ 5 kinetic
>>>>>>    1/ 5 fuse
>>>>>>    1/ 5 mgr
>>>>>>    1/ 5 mgrc
>>>>>>    1/ 5 dpdk
>>>>>>    1/ 5 eventtrace
>>>>>>   -2/-2 (syslog threshold)
>>>>>>   -1/-1 (stderr threshold)
>>>>>>   max_recent     10000
>>>>>>   max_new         1000
>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>> --- end dump of recent events ---
>>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>> the following and i hope you might have an idea:
>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>
>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>
>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>
>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>> luminous when all this happened.
>>>>>>>
>>>>>>> Did this start right after the upgrade?
>>>>>>>
>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>> can provide would help!
>>>>>>>
>>>>>>> Thanks-
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>
>>>>>>>>>   object_snaps out;
>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>   if (r < 0)
>>>>>>>>>     return r;
>>>>>>>>>
>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>
>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>> can tolerate it...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>
>>>>>>>>>> HI Sage,
>>>>>>>>>>
>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>
>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>
>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>  6:
>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>
>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>
>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>
>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>> 697]
>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Greets,
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Jan. 24, 2018, 10:16 a.m. UTC | #24
On Wed, 24 Jan 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> Am 24.01.2018 um 01:07 schrieb Sage Weil:
> > On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 22.01.2018 um 20:01 schrieb Sage Weil:
> >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
> >>>> telling me:
> >>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
> >>>> on pg 3.33e oid
> >>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
> >>>> mapper: , oi: b0d7c...repaired
> >>>>
> >>>> Currently it seems it does an endless repair and never ends.
> >>>
> >>> On many objects or the same object?
> >>
> >> OK no it still does endless repair but the reason is it also repairs
> >> "newly" created snapshots.
> >>
> >> Example:
> >> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error
> >> on pg 3.c70 oid
> >> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps
> >> missing in mapper, should be: b0d7c...repaired
> >>
> >> rbd image 'vm-251-disk-1':
> >>         size 170 GB in 43520 objects
> >>         order 22 (4096 kB objects)
> >>         block_name_prefix: rbd_data.52ab946b8b4567
> >>         format: 2
> >>         features: layering
> >>         flags:
> >> SNAPID NAME                                   SIZE TIMESTAMP
> >> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018
> >>
> >> So this is a fresh snapshot which needs a repair?
> > 
> > Aha!  That's a helpful clue.  Is there any chance you can crank up teh 
> > osd debug log on one OSD can catch one of these cases?  (We want to 
> > capture the clone being created, and also the scrub on the pg that follows 
> > and sees the mismatch).
> 
> You mean debug_osd? How high must it be? And do we really need all log
> messages or might it be enough to change some of them to 0 in the source
> code - so they always log?

It's just a lot of work to figure out which ones that would be.  If debug 
osd  = 20 isn't possible (even on a single OSD), then just the logging 
statements (ideally all of them) in SnapMapper.cc would be a good start.

sage


> 
> Greets,
> Stefan
> 
> > Thanks!
> > sage
> > 
> > 
> >> Greets,
> >> Stefan
> >>
> >>>
> >>> sage
> >>>
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
> >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi Sage,
> >>>>>>
> >>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
> >>>>>> it's OK to skip the assert.
> >>>>>>
> >>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
> >>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
> >>>>>> needs to get killed to proceed.
> >>>>>
> >>>>> I believe this will fix it:
> >>>>>
> >>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
> >>>>>
> >>>>> Cherry-pick that and let me know?
> >>>>>
> >>>>> Thanks!
> >>>>> sage
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> The log output is always:
> >>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> >>>>>> log [WRN] : slow request 141.951820 seconds old, received at
> >>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> >>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> >>>>>> 00000157:head v 927921'526559757) currently commit_sent
> >>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> >>>>>> log [WRN] : slow request 141.643853 seconds old, received at
> >>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> >>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> >>>>>> 00000147d:head v 927921'29105383) currently commit_sent
> >>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> >>>>>> log [WRN] : slow request 141.313220 seconds old, received at
> >>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> >>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> >>>>>> 000000aff:head v 927921'164552063) currently commit_sent
> >>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> >>>>>> (Aborted) **
> >>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> >>>>>>
> >>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
> >>>>>>  2: (()+0xf890) [0x7f100b974890]
> >>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
> >>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> >>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
> >>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> >>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
> >>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
> >>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
> >>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
> >>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
> >>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>> [0x561d87138bc7]
> >>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>> const&)+0x57) [0x561d873b0db7]
> >>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
> >>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>> [0x561d876f42bd]
> >>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
> >>>>>>  15: (()+0x8064) [0x7f100b96d064]
> >>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
> >>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>> needed to interpret this.
> >>>>>>
> >>>>>> --- logging levels ---
> >>>>>>    0/ 5 none
> >>>>>>    0/ 0 lockdep
> >>>>>>    0/ 0 context
> >>>>>>    0/ 0 crush
> >>>>>>    0/ 0 mds
> >>>>>>    0/ 0 mds_balancer
> >>>>>>    0/ 0 mds_locker
> >>>>>>    0/ 0 mds_log
> >>>>>>    0/ 0 mds_log_expire
> >>>>>>    0/ 0 mds_migrator
> >>>>>>    0/ 0 buffer
> >>>>>>    0/ 0 timer
> >>>>>>    0/ 0 filer
> >>>>>>    0/ 1 striper
> >>>>>>    0/ 0 objecter
> >>>>>>    0/ 0 rados
> >>>>>>    0/ 0 rbd
> >>>>>>    0/ 5 rbd_mirror
> >>>>>>    0/ 5 rbd_replay
> >>>>>>    0/ 0 journaler
> >>>>>>    0/ 0 objectcacher
> >>>>>>    0/ 0 client
> >>>>>>    0/ 0 osd
> >>>>>>    0/ 0 optracker
> >>>>>>    0/ 0 objclass
> >>>>>>    0/ 0 filestore
> >>>>>>    0/ 0 journal
> >>>>>>    0/ 0 ms
> >>>>>>    0/ 0 mon
> >>>>>>    0/ 0 monc
> >>>>>>    0/ 0 paxos
> >>>>>>    0/ 0 tp
> >>>>>>    0/ 0 auth
> >>>>>>    1/ 5 crypto
> >>>>>>    0/ 0 finisher
> >>>>>>    1/ 1 reserver
> >>>>>>    0/ 0 heartbeatmap
> >>>>>>    0/ 0 perfcounter
> >>>>>>    0/ 0 rgw
> >>>>>>    1/10 civetweb
> >>>>>>    1/ 5 javaclient
> >>>>>>    0/ 0 asok
> >>>>>>    0/ 0 throttle
> >>>>>>    0/ 0 refs
> >>>>>>    1/ 5 xio
> >>>>>>    1/ 5 compressor
> >>>>>>    1/ 5 bluestore
> >>>>>>    1/ 5 bluefs
> >>>>>>    1/ 3 bdev
> >>>>>>    1/ 5 kstore
> >>>>>>    4/ 5 rocksdb
> >>>>>>    4/ 5 leveldb
> >>>>>>    4/ 5 memdb
> >>>>>>    1/ 5 kinetic
> >>>>>>    1/ 5 fuse
> >>>>>>    1/ 5 mgr
> >>>>>>    1/ 5 mgrc
> >>>>>>    1/ 5 dpdk
> >>>>>>    1/ 5 eventtrace
> >>>>>>   -2/-2 (syslog threshold)
> >>>>>>   -1/-1 (stderr threshold)
> >>>>>>   max_recent     10000
> >>>>>>   max_new         1000
> >>>>>>   log_file /var/log/ceph/ceph-osd.59.log
> >>>>>> --- end dump of recent events ---
> >>>>>>
> >>>>>>
> >>>>>> Greets,
> >>>>>> Stefan
> >>>>>>
> >>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Hi Sage,
> >>>>>>>>
> >>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
> >>>>>>>> the following and i hope you might have an idea:
> >>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
> >>>>>>>> running luminous? How can they be missing in snapmapper again?
> >>>>>>>
> >>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> >>>>>>> that when you hit the crash we have some context as to what is going on.
> >>>>>>>
> >>>>>>> Did you track down which part of get_snaps() is returning the error?
> >>>>>>>
> >>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
> >>>>>>>> luminous when all this happened.
> >>>>>>>
> >>>>>>> Did this start right after the upgrade?
> >>>>>>>
> >>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
> >>>>>>> instead (unless the debug option is enabled, for our regression testing), 
> >>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
> >>>>>>> can provide would help!
> >>>>>>>
> >>>>>>> Thanks-
> >>>>>>> sage
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Greets,
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>>>>>>>> I'm guessing it is coming from
> >>>>>>>>>
> >>>>>>>>>   object_snaps out;
> >>>>>>>>>   int r = get_snaps(oid, &out);
> >>>>>>>>>   if (r < 0)
> >>>>>>>>>     return r;
> >>>>>>>>>
> >>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>>>>>>>> confirm that, and possibly also add debug lines to the various return 
> >>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>>>>>>>
> >>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
> >>>>>>>>> can tolerate it...
> >>>>>>>>>
> >>>>>>>>> sage
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>
> >>>>>>>>>> HI Sage,
> >>>>>>>>>>
> >>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
> >>>>>>>>>>
> >>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>>>>>>>
> >>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>>>>>>>
> >>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>> const*)+0x102) [0x55c0100d0372]
> >>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>> [0x55c00fdf6642]
> >>>>>>>>>>  6:
> >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>> [0x55c00fdfc684]
> >>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>> [0x55c00fd22030]
> >>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>> [0x55c00fb1abc7]
> >>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>> [0x55c0100d5ffd]
> >>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>> needed to interpret this.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>>>>>>>   const hobject_t &oid,
> >>>>>>>>>>>>>>   object_snaps *out)
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>   assert(check(oid));
> >>>>>>>>>>>>>>   set<string> keys;
> >>>>>>>>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>>>>     return r;
> >>>>>>>>>>>>>>   if (got.empty())
> >>>>>>>>>>>>>>     return -ENOENT;
> >>>>>>>>>>>>>>   if (out) {
> >>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>>>>>>>     ::decode(*out, bp);
> >>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>>>>>>>   } else {
> >>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>>>>>>>   }
> >>>>>>>>>>>>>>   return 0;
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> is it save to comment that assert?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>>>>>>>> can't repair.
> >>>>>>>>>>>>
> >>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>>>>>>>> SnapMapper::get_snaps(const h
> >>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>>>>>>>> 2018-01-18 13:00:54.840396
> >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>>>>>>>
> >>>>>>>>>>> I think if you switch that assert to a warning it will repair...
> >>>>>>>>>>>
> >>>>>>>>>>> sage
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>>>>>>>> 697]
> >>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>> [0x5561cdcd7bc7]
> >>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>> [0x5561ce292e7d]
> >>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Greets,
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>>
> >>>>>>>>>>>>> sage
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>>>>>>>> around the current assert:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>>>>>>>> +    } else {
> >>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>> +    }
> >>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>>>>>>>        recovery_info.soid,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> s
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>>>>>>>  5:
> >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 29, 2018, 3:33 p.m. UTC | #25
Hi Sage,

Am 24.01.2018 um 11:16 schrieb Sage Weil:
> On Wed, 24 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> Am 24.01.2018 um 01:07 schrieb Sage Weil:
>>> On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>> telling me:
>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>> on pg 3.33e oid
>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>
>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>
>>>>> On many objects or the same object?
>>>>
>>>> OK no it still does endless repair but the reason is it also repairs
>>>> "newly" created snapshots.
>>>>
>>>> Example:
>>>> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error
>>>> on pg 3.c70 oid
>>>> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps
>>>> missing in mapper, should be: b0d7c...repaired
>>>>
>>>> rbd image 'vm-251-disk-1':
>>>>         size 170 GB in 43520 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.52ab946b8b4567
>>>>         format: 2
>>>>         features: layering
>>>>         flags:
>>>> SNAPID NAME                                   SIZE TIMESTAMP
>>>> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018
>>>>
>>>> So this is a fresh snapshot which needs a repair?
>>>
>>> Aha!  That's a helpful clue.  Is there any chance you can crank up teh 
>>> osd debug log on one OSD can catch one of these cases?  (We want to 
>>> capture the clone being created, and also the scrub on the pg that follows 
>>> and sees the mismatch).
>>
>> You mean debug_osd? How high must it be? And do we really need all log
>> messages or might it be enough to change some of them to 0 in the source
>> code - so they always log?
> 
> It's just a lot of work to figure out which ones that would be.  If debug 
> osd  = 20 isn't possible (even on a single OSD), then just the logging 
> statements (ideally all of them) in SnapMapper.cc would be a good start.

sorry for the slow response. I will try to grab a log tonight.

Greets,
Stefan


> sage
> 
> 
>>
>> Greets,
>> Stefan
>>
>>> Thanks!
>>> sage
>>>
>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>>>
>>>>> sage
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>> it's OK to skip the assert.
>>>>>>>>
>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>> needs to get killed to proceed.
>>>>>>>
>>>>>>> I believe this will fix it:
>>>>>>>
>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>
>>>>>>> Cherry-pick that and let me know?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The log output is always:
>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>> (Aborted) **
>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561d87138bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561d876f42bd]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> --- logging levels ---
>>>>>>>>    0/ 5 none
>>>>>>>>    0/ 0 lockdep
>>>>>>>>    0/ 0 context
>>>>>>>>    0/ 0 crush
>>>>>>>>    0/ 0 mds
>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>    0/ 0 mds_locker
>>>>>>>>    0/ 0 mds_log
>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>    0/ 0 buffer
>>>>>>>>    0/ 0 timer
>>>>>>>>    0/ 0 filer
>>>>>>>>    0/ 1 striper
>>>>>>>>    0/ 0 objecter
>>>>>>>>    0/ 0 rados
>>>>>>>>    0/ 0 rbd
>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>    0/ 0 journaler
>>>>>>>>    0/ 0 objectcacher
>>>>>>>>    0/ 0 client
>>>>>>>>    0/ 0 osd
>>>>>>>>    0/ 0 optracker
>>>>>>>>    0/ 0 objclass
>>>>>>>>    0/ 0 filestore
>>>>>>>>    0/ 0 journal
>>>>>>>>    0/ 0 ms
>>>>>>>>    0/ 0 mon
>>>>>>>>    0/ 0 monc
>>>>>>>>    0/ 0 paxos
>>>>>>>>    0/ 0 tp
>>>>>>>>    0/ 0 auth
>>>>>>>>    1/ 5 crypto
>>>>>>>>    0/ 0 finisher
>>>>>>>>    1/ 1 reserver
>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>    0/ 0 perfcounter
>>>>>>>>    0/ 0 rgw
>>>>>>>>    1/10 civetweb
>>>>>>>>    1/ 5 javaclient
>>>>>>>>    0/ 0 asok
>>>>>>>>    0/ 0 throttle
>>>>>>>>    0/ 0 refs
>>>>>>>>    1/ 5 xio
>>>>>>>>    1/ 5 compressor
>>>>>>>>    1/ 5 bluestore
>>>>>>>>    1/ 5 bluefs
>>>>>>>>    1/ 3 bdev
>>>>>>>>    1/ 5 kstore
>>>>>>>>    4/ 5 rocksdb
>>>>>>>>    4/ 5 leveldb
>>>>>>>>    4/ 5 memdb
>>>>>>>>    1/ 5 kinetic
>>>>>>>>    1/ 5 fuse
>>>>>>>>    1/ 5 mgr
>>>>>>>>    1/ 5 mgrc
>>>>>>>>    1/ 5 dpdk
>>>>>>>>    1/ 5 eventtrace
>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>   max_recent     10000
>>>>>>>>   max_new         1000
>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>> --- end dump of recent events ---
>>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>
>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>
>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>
>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>> luminous when all this happened.
>>>>>>>>>
>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>
>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>> can provide would help!
>>>>>>>>>
>>>>>>>>> Thanks-
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>
>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>     return r;
>>>>>>>>>>>
>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>
>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>
>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>  6:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Jan. 30, 2018, 7:25 p.m. UTC | #26
Hi Sage,

currently i'm still seeing those repair messages but only one every few
hours so i can't reproduce this right now.

Running an osd with debug osd = 20 for a whole day is not an option so
i'll check whether those message get less and are may be solved by
itself after another week.

Greets,
Stefan
Am 24.01.2018 um 11:16 schrieb Sage Weil:
> On Wed, 24 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> Am 24.01.2018 um 01:07 schrieb Sage Weil:
>>> On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>> telling me:
>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>> on pg 3.33e oid
>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>
>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>
>>>>> On many objects or the same object?
>>>>
>>>> OK no it still does endless repair but the reason is it also repairs
>>>> "newly" created snapshots.
>>>>
>>>> Example:
>>>> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error
>>>> on pg 3.c70 oid
>>>> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps
>>>> missing in mapper, should be: b0d7c...repaired
>>>>
>>>> rbd image 'vm-251-disk-1':
>>>>         size 170 GB in 43520 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.52ab946b8b4567
>>>>         format: 2
>>>>         features: layering
>>>>         flags:
>>>> SNAPID NAME                                   SIZE TIMESTAMP
>>>> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018
>>>>
>>>> So this is a fresh snapshot which needs a repair?
>>>
>>> Aha!  That's a helpful clue.  Is there any chance you can crank up teh 
>>> osd debug log on one OSD can catch one of these cases?  (We want to 
>>> capture the clone being created, and also the scrub on the pg that follows 
>>> and sees the mismatch).
>>
>> You mean debug_osd? How high must it be? And do we really need all log
>> messages or might it be enough to change some of them to 0 in the source
>> code - so they always log?
> 
> It's just a lot of work to figure out which ones that would be.  If debug 
> osd  = 20 isn't possible (even on a single OSD), then just the logging 
> statements (ideally all of them) in SnapMapper.cc would be a good start.
> 
> sage
> 
> 
>>
>> Greets,
>> Stefan
>>
>>> Thanks!
>>> sage
>>>
>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>>>
>>>>> sage
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>> it's OK to skip the assert.
>>>>>>>>
>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>> needs to get killed to proceed.
>>>>>>>
>>>>>>> I believe this will fix it:
>>>>>>>
>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>
>>>>>>> Cherry-pick that and let me know?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The log output is always:
>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>> (Aborted) **
>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561d87138bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561d876f42bd]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> --- logging levels ---
>>>>>>>>    0/ 5 none
>>>>>>>>    0/ 0 lockdep
>>>>>>>>    0/ 0 context
>>>>>>>>    0/ 0 crush
>>>>>>>>    0/ 0 mds
>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>    0/ 0 mds_locker
>>>>>>>>    0/ 0 mds_log
>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>    0/ 0 buffer
>>>>>>>>    0/ 0 timer
>>>>>>>>    0/ 0 filer
>>>>>>>>    0/ 1 striper
>>>>>>>>    0/ 0 objecter
>>>>>>>>    0/ 0 rados
>>>>>>>>    0/ 0 rbd
>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>    0/ 0 journaler
>>>>>>>>    0/ 0 objectcacher
>>>>>>>>    0/ 0 client
>>>>>>>>    0/ 0 osd
>>>>>>>>    0/ 0 optracker
>>>>>>>>    0/ 0 objclass
>>>>>>>>    0/ 0 filestore
>>>>>>>>    0/ 0 journal
>>>>>>>>    0/ 0 ms
>>>>>>>>    0/ 0 mon
>>>>>>>>    0/ 0 monc
>>>>>>>>    0/ 0 paxos
>>>>>>>>    0/ 0 tp
>>>>>>>>    0/ 0 auth
>>>>>>>>    1/ 5 crypto
>>>>>>>>    0/ 0 finisher
>>>>>>>>    1/ 1 reserver
>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>    0/ 0 perfcounter
>>>>>>>>    0/ 0 rgw
>>>>>>>>    1/10 civetweb
>>>>>>>>    1/ 5 javaclient
>>>>>>>>    0/ 0 asok
>>>>>>>>    0/ 0 throttle
>>>>>>>>    0/ 0 refs
>>>>>>>>    1/ 5 xio
>>>>>>>>    1/ 5 compressor
>>>>>>>>    1/ 5 bluestore
>>>>>>>>    1/ 5 bluefs
>>>>>>>>    1/ 3 bdev
>>>>>>>>    1/ 5 kstore
>>>>>>>>    4/ 5 rocksdb
>>>>>>>>    4/ 5 leveldb
>>>>>>>>    4/ 5 memdb
>>>>>>>>    1/ 5 kinetic
>>>>>>>>    1/ 5 fuse
>>>>>>>>    1/ 5 mgr
>>>>>>>>    1/ 5 mgrc
>>>>>>>>    1/ 5 dpdk
>>>>>>>>    1/ 5 eventtrace
>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>   max_recent     10000
>>>>>>>>   max_new         1000
>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>> --- end dump of recent events ---
>>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>
>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>
>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>
>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>> luminous when all this happened.
>>>>>>>>>
>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>
>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>> can provide would help!
>>>>>>>>>
>>>>>>>>> Thanks-
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>
>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>     return r;
>>>>>>>>>>>
>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>
>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>
>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>  6:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 2, 2018, 7:19 p.m. UTC | #27
Hi Sage,

some days ago i reverted to stock luminous with
https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
+
https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
on top. I scrubbed and deep-scrubbed all pgs before and no errors were
found.

this worked fine until the ceph balancer runned and started remapping
some pgs.

This again trigged this assert:

/build/ceph/src/osd/SnapMapper.cc: In function 'int
SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
thread 7f4289bff700 time 2018-02-02 17:46:33.249579
/build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())

Any idea how to fix it?

Greets,
Stefan

Am 22.01.2018 um 20:01 schrieb Sage Weil:
> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>> telling me:
>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>> on pg 3.33e oid
>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>> mapper: , oi: b0d7c...repaired
>>
>> Currently it seems it does an endless repair and never ends.
> 
> On many objects or the same object?
> 
> sage
> 
>>
>> Greets,
>> Stefan
>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>> it's OK to skip the assert.
>>>>
>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>> needs to get killed to proceed.
>>>
>>> I believe this will fix it:
>>>
>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>
>>> Cherry-pick that and let me know?
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>
>>>> The log output is always:
>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>> (Aborted) **
>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x561d87138bc7]
>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x561d873b0db7]
>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x561d876f42bd]
>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> --- logging levels ---
>>>>    0/ 5 none
>>>>    0/ 0 lockdep
>>>>    0/ 0 context
>>>>    0/ 0 crush
>>>>    0/ 0 mds
>>>>    0/ 0 mds_balancer
>>>>    0/ 0 mds_locker
>>>>    0/ 0 mds_log
>>>>    0/ 0 mds_log_expire
>>>>    0/ 0 mds_migrator
>>>>    0/ 0 buffer
>>>>    0/ 0 timer
>>>>    0/ 0 filer
>>>>    0/ 1 striper
>>>>    0/ 0 objecter
>>>>    0/ 0 rados
>>>>    0/ 0 rbd
>>>>    0/ 5 rbd_mirror
>>>>    0/ 5 rbd_replay
>>>>    0/ 0 journaler
>>>>    0/ 0 objectcacher
>>>>    0/ 0 client
>>>>    0/ 0 osd
>>>>    0/ 0 optracker
>>>>    0/ 0 objclass
>>>>    0/ 0 filestore
>>>>    0/ 0 journal
>>>>    0/ 0 ms
>>>>    0/ 0 mon
>>>>    0/ 0 monc
>>>>    0/ 0 paxos
>>>>    0/ 0 tp
>>>>    0/ 0 auth
>>>>    1/ 5 crypto
>>>>    0/ 0 finisher
>>>>    1/ 1 reserver
>>>>    0/ 0 heartbeatmap
>>>>    0/ 0 perfcounter
>>>>    0/ 0 rgw
>>>>    1/10 civetweb
>>>>    1/ 5 javaclient
>>>>    0/ 0 asok
>>>>    0/ 0 throttle
>>>>    0/ 0 refs
>>>>    1/ 5 xio
>>>>    1/ 5 compressor
>>>>    1/ 5 bluestore
>>>>    1/ 5 bluefs
>>>>    1/ 3 bdev
>>>>    1/ 5 kstore
>>>>    4/ 5 rocksdb
>>>>    4/ 5 leveldb
>>>>    4/ 5 memdb
>>>>    1/ 5 kinetic
>>>>    1/ 5 fuse
>>>>    1/ 5 mgr
>>>>    1/ 5 mgrc
>>>>    1/ 5 dpdk
>>>>    1/ 5 eventtrace
>>>>   -2/-2 (syslog threshold)
>>>>   -1/-1 (stderr threshold)
>>>>   max_recent     10000
>>>>   max_new         1000
>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>> --- end dump of recent events ---
>>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>> the following and i hope you might have an idea:
>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>
>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>
>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>
>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>> luminous when all this happened.
>>>>>
>>>>> Did this start right after the upgrade?
>>>>>
>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>> can provide would help!
>>>>>
>>>>> Thanks-
>>>>> sage
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>>
>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>> I'm guessing it is coming from
>>>>>>>
>>>>>>>   object_snaps out;
>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>   if (r < 0)
>>>>>>>     return r;
>>>>>>>
>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>
>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>> can tolerate it...
>>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>
>>>>>>>> HI Sage,
>>>>>>>>
>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>
>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>
>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x55c00fdf6642]
>>>>>>>>  6:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x55c00fdfc684]
>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x55c00fd22030]
>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>
>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>> {
>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>     return r;
>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>   } else {
>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>   }
>>>>>>>>>>>>   return 0;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>
>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>> can't repair.
>>>>>>>>>>
>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>
>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>
>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>> 697]
>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Feb. 2, 2018, 7:28 p.m. UTC | #28
On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> some days ago i reverted to stock luminous with
> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
> +
> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
> found.
> 
> this worked fine until the ceph balancer runned and started remapping
> some pgs.
> 
> This again trigged this assert:
> 
> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> 
> Any idea how to fix it?

Cheap workaround is to comment out that assert.  I'm still tryign to sort 
out how this happened, though.  Not sure if you shared this already, but:

- bluestore or filestore?
- no cache tiering, right?  nowhere on the cluster?

It looks like the SnapMapper entry with an empty snaps vector came in via 
add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
doesn't have that check).  There are a few place it could come from, 
including the on_local_recover() call and the update_snap_map() call.  
That last one is interesting because it pulls the snaps for a new clone 
out of the pg log entry.. maybe those aren't getting populated properly.

Can you confirm that all of the OSDs on the cluster are running the 
latest?  (ceph versions).

Thanks!
sage


> 
> Greets,
> Stefan
> 
> Am 22.01.2018 um 20:01 schrieb Sage Weil:
> > On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> thanks! That one fixed it! Now i'm just waiting if the log ever stops
> >> telling me:
> >> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
> >> on pg 3.33e oid
> >> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
> >> mapper: , oi: b0d7c...repaired
> >>
> >> Currently it seems it does an endless repair and never ends.
> > 
> > On many objects or the same object?
> > 
> > sage
> > 
> >>
> >> Greets,
> >> Stefan
> >> Am 22.01.2018 um 15:30 schrieb Sage Weil:
> >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> get_snaps returned the error as there are no snaps in the snapmapper. So
> >>>> it's OK to skip the assert.
> >>>>
> >>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
> >>>> fix all those errors because in around 1-3% the osd seem to deadlock and
> >>>> needs to get killed to proceed.
> >>>
> >>> I believe this will fix it:
> >>>
> >>> https://github.com/liewegas/ceph/commit/wip-snapmapper
> >>>
> >>> Cherry-pick that and let me know?
> >>>
> >>> Thanks!
> >>> sage
> >>>
> >>>
> >>>>
> >>>> The log output is always:
> >>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> >>>> log [WRN] : slow request 141.951820 seconds old, received at
> >>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> >>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> >>>> 00000157:head v 927921'526559757) currently commit_sent
> >>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> >>>> log [WRN] : slow request 141.643853 seconds old, received at
> >>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> >>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> >>>> 00000147d:head v 927921'29105383) currently commit_sent
> >>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> >>>> log [WRN] : slow request 141.313220 seconds old, received at
> >>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> >>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> >>>> 000000aff:head v 927921'164552063) currently commit_sent
> >>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> >>>> (Aborted) **
> >>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> >>>>
> >>>>  ceph version 12.2.2-94-g92923ef
> >>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>  1: (()+0xa4423c) [0x561d876ab23c]
> >>>>  2: (()+0xf890) [0x7f100b974890]
> >>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
> >>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> >>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
> >>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> >>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
> >>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
> >>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
> >>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
> >>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
> >>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>> [0x561d87138bc7]
> >>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>> const&)+0x57) [0x561d873b0db7]
> >>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
> >>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>> [0x561d876f42bd]
> >>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
> >>>>  15: (()+0x8064) [0x7f100b96d064]
> >>>>  16: (clone()+0x6d) [0x7f100aa6162d]
> >>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>> needed to interpret this.
> >>>>
> >>>> --- logging levels ---
> >>>>    0/ 5 none
> >>>>    0/ 0 lockdep
> >>>>    0/ 0 context
> >>>>    0/ 0 crush
> >>>>    0/ 0 mds
> >>>>    0/ 0 mds_balancer
> >>>>    0/ 0 mds_locker
> >>>>    0/ 0 mds_log
> >>>>    0/ 0 mds_log_expire
> >>>>    0/ 0 mds_migrator
> >>>>    0/ 0 buffer
> >>>>    0/ 0 timer
> >>>>    0/ 0 filer
> >>>>    0/ 1 striper
> >>>>    0/ 0 objecter
> >>>>    0/ 0 rados
> >>>>    0/ 0 rbd
> >>>>    0/ 5 rbd_mirror
> >>>>    0/ 5 rbd_replay
> >>>>    0/ 0 journaler
> >>>>    0/ 0 objectcacher
> >>>>    0/ 0 client
> >>>>    0/ 0 osd
> >>>>    0/ 0 optracker
> >>>>    0/ 0 objclass
> >>>>    0/ 0 filestore
> >>>>    0/ 0 journal
> >>>>    0/ 0 ms
> >>>>    0/ 0 mon
> >>>>    0/ 0 monc
> >>>>    0/ 0 paxos
> >>>>    0/ 0 tp
> >>>>    0/ 0 auth
> >>>>    1/ 5 crypto
> >>>>    0/ 0 finisher
> >>>>    1/ 1 reserver
> >>>>    0/ 0 heartbeatmap
> >>>>    0/ 0 perfcounter
> >>>>    0/ 0 rgw
> >>>>    1/10 civetweb
> >>>>    1/ 5 javaclient
> >>>>    0/ 0 asok
> >>>>    0/ 0 throttle
> >>>>    0/ 0 refs
> >>>>    1/ 5 xio
> >>>>    1/ 5 compressor
> >>>>    1/ 5 bluestore
> >>>>    1/ 5 bluefs
> >>>>    1/ 3 bdev
> >>>>    1/ 5 kstore
> >>>>    4/ 5 rocksdb
> >>>>    4/ 5 leveldb
> >>>>    4/ 5 memdb
> >>>>    1/ 5 kinetic
> >>>>    1/ 5 fuse
> >>>>    1/ 5 mgr
> >>>>    1/ 5 mgrc
> >>>>    1/ 5 dpdk
> >>>>    1/ 5 eventtrace
> >>>>   -2/-2 (syslog threshold)
> >>>>   -1/-1 (stderr threshold)
> >>>>   max_recent     10000
> >>>>   max_new         1000
> >>>>   log_file /var/log/ceph/ceph-osd.59.log
> >>>> --- end dump of recent events ---
> >>>>
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>>
> >>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> >>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi Sage,
> >>>>>>
> >>>>>> thanks for your reply. I'll do this. What i still don't understand is
> >>>>>> the following and i hope you might have an idea:
> >>>>>> 1.) All the snaps mentioned here are fresh - i created them today
> >>>>>> running luminous? How can they be missing in snapmapper again?
> >>>>>
> >>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> >>>>> that when you hit the crash we have some context as to what is going on.
> >>>>>
> >>>>> Did you track down which part of get_snaps() is returning the error?
> >>>>>
> >>>>>> 2.) How did this happen? The cluster was jewel and was updated to
> >>>>>> luminous when all this happened.
> >>>>>
> >>>>> Did this start right after the upgrade?
> >>>>>
> >>>>> I started a PR that relaxes some of these assertions so that they clean up 
> >>>>> instead (unless the debug option is enabled, for our regression testing), 
> >>>>> but I'm still unclear about what the inconsistency is... any loggin you 
> >>>>> can provide would help!
> >>>>>
> >>>>> Thanks-
> >>>>> sage
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Greets,
> >>>>>> Stefan
> >>>>>>
> >>>>>>
> >>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>>>>>> I'm guessing it is coming from
> >>>>>>>
> >>>>>>>   object_snaps out;
> >>>>>>>   int r = get_snaps(oid, &out);
> >>>>>>>   if (r < 0)
> >>>>>>>     return r;
> >>>>>>>
> >>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>>>>>> confirm that, and possibly also add debug lines to the various return 
> >>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>>>>>
> >>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>>>>>> hard to say without seeing what the inconsistency there is and whether we 
> >>>>>>> can tolerate it...
> >>>>>>>
> >>>>>>> sage
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>
> >>>>>>>> HI Sage,
> >>>>>>>>
> >>>>>>>> any ideas for this one sawing while doing a deep scrub:
> >>>>>>>>
> >>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>>>>>
> >>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>>>>>
> >>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>> const*)+0x102) [0x55c0100d0372]
> >>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>> [0x55c00fdf6642]
> >>>>>>>>  6:
> >>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>> [0x55c00fdfc684]
> >>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>> [0x55c00fd22030]
> >>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>> [0x55c00fb1abc7]
> >>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>> [0x55c0100d5ffd]
> >>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>> needed to interpret this.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>> Hi Sage,
> >>>>>>>>>>
> >>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>>>>>
> >>>>>>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>>>>>   const hobject_t &oid,
> >>>>>>>>>>>>   object_snaps *out)
> >>>>>>>>>>>> {
> >>>>>>>>>>>>   assert(check(oid));
> >>>>>>>>>>>>   set<string> keys;
> >>>>>>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>>     return r;
> >>>>>>>>>>>>   if (got.empty())
> >>>>>>>>>>>>     return -ENOENT;
> >>>>>>>>>>>>   if (out) {
> >>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>>>>>     ::decode(*out, bp);
> >>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>>>>>   } else {
> >>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>>>>>   }
> >>>>>>>>>>>>   return 0;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> is it save to comment that assert?
> >>>>>>>>>>>
> >>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>>>>>> can't repair.
> >>>>>>>>>>
> >>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>>>>>
> >>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>>>>>> SnapMapper::get_snaps(const h
> >>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>>>>>> 2018-01-18 13:00:54.840396
> >>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>>>>>
> >>>>>>>>> I think if you switch that assert to a warning it will repair...
> >>>>>>>>>
> >>>>>>>>> sage
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>>>>>> 697]
> >>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>> [0x5561cdcd7bc7]
> >>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>> [0x5561ce292e7d]
> >>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>> needed to interpret this.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Greets,
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>>> sage
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>>>>>> around the current assert:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>>>>>> +    } else {
> >>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>> +    }
> >>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>>>>>        recovery_info.soid,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> s
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>>>>>  5:
> >>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 2, 2018, 8:21 p.m. UTC | #29
Am 02.02.2018 um 20:28 schrieb Sage Weil:
> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> some days ago i reverted to stock luminous with
>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>> +
>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>> found.
>>
>> this worked fine until the ceph balancer runned and started remapping
>> some pgs.
>>
>> This again trigged this assert:
>>
>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>
>> Any idea how to fix it?
> 
> Cheap workaround is to comment out that assert.  I'm still tryign to sort 

Yes i did but i feel unconforable with it as the snapmapper value is
still wrong.

> out how this happened, though.  Not sure if you shared this already, but:
> - bluestore or filestore?

As i first reported it only filestore. Now also some bluestore OSDs.

> - no cache tiering, right?  nowhere on the cluster?
Yes and Yes. Never used cache tiering at all.

> It looks like the SnapMapper entry with an empty snaps vector came in via 
> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
> doesn't have that check).  There are a few place it could come from, 
> including the on_local_recover() call and the update_snap_map() call.  
> That last one is interesting because it pulls the snaps for a new clone 
> out of the pg log entry.. maybe those aren't getting populated properly.

Is it also possible that they got corrupted in the past and now ceph is
unable to detect it and repair it with a scrub or deep scrub?

> Can you confirm that all of the OSDs on the cluster are running the
> latest?  (ceph versions).

Yes they do - i always check this after each update and i rechecked it
again.

Stefan


>>
>> Greets,
>> Stefan
>>
>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>> telling me:
>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>> on pg 3.33e oid
>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>> mapper: , oi: b0d7c...repaired
>>>>
>>>> Currently it seems it does an endless repair and never ends.
>>>
>>> On many objects or the same object?
>>>
>>> sage
>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>> it's OK to skip the assert.
>>>>>>
>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>> needs to get killed to proceed.
>>>>>
>>>>> I believe this will fix it:
>>>>>
>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>
>>>>> Cherry-pick that and let me know?
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>>
>>>>>>
>>>>>> The log output is always:
>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>> (Aborted) **
>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>
>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>> [0x561d87138bc7]
>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>> [0x561d876f42bd]
>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to interpret this.
>>>>>>
>>>>>> --- logging levels ---
>>>>>>    0/ 5 none
>>>>>>    0/ 0 lockdep
>>>>>>    0/ 0 context
>>>>>>    0/ 0 crush
>>>>>>    0/ 0 mds
>>>>>>    0/ 0 mds_balancer
>>>>>>    0/ 0 mds_locker
>>>>>>    0/ 0 mds_log
>>>>>>    0/ 0 mds_log_expire
>>>>>>    0/ 0 mds_migrator
>>>>>>    0/ 0 buffer
>>>>>>    0/ 0 timer
>>>>>>    0/ 0 filer
>>>>>>    0/ 1 striper
>>>>>>    0/ 0 objecter
>>>>>>    0/ 0 rados
>>>>>>    0/ 0 rbd
>>>>>>    0/ 5 rbd_mirror
>>>>>>    0/ 5 rbd_replay
>>>>>>    0/ 0 journaler
>>>>>>    0/ 0 objectcacher
>>>>>>    0/ 0 client
>>>>>>    0/ 0 osd
>>>>>>    0/ 0 optracker
>>>>>>    0/ 0 objclass
>>>>>>    0/ 0 filestore
>>>>>>    0/ 0 journal
>>>>>>    0/ 0 ms
>>>>>>    0/ 0 mon
>>>>>>    0/ 0 monc
>>>>>>    0/ 0 paxos
>>>>>>    0/ 0 tp
>>>>>>    0/ 0 auth
>>>>>>    1/ 5 crypto
>>>>>>    0/ 0 finisher
>>>>>>    1/ 1 reserver
>>>>>>    0/ 0 heartbeatmap
>>>>>>    0/ 0 perfcounter
>>>>>>    0/ 0 rgw
>>>>>>    1/10 civetweb
>>>>>>    1/ 5 javaclient
>>>>>>    0/ 0 asok
>>>>>>    0/ 0 throttle
>>>>>>    0/ 0 refs
>>>>>>    1/ 5 xio
>>>>>>    1/ 5 compressor
>>>>>>    1/ 5 bluestore
>>>>>>    1/ 5 bluefs
>>>>>>    1/ 3 bdev
>>>>>>    1/ 5 kstore
>>>>>>    4/ 5 rocksdb
>>>>>>    4/ 5 leveldb
>>>>>>    4/ 5 memdb
>>>>>>    1/ 5 kinetic
>>>>>>    1/ 5 fuse
>>>>>>    1/ 5 mgr
>>>>>>    1/ 5 mgrc
>>>>>>    1/ 5 dpdk
>>>>>>    1/ 5 eventtrace
>>>>>>   -2/-2 (syslog threshold)
>>>>>>   -1/-1 (stderr threshold)
>>>>>>   max_recent     10000
>>>>>>   max_new         1000
>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>> --- end dump of recent events ---
>>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>> the following and i hope you might have an idea:
>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>
>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>
>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>
>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>> luminous when all this happened.
>>>>>>>
>>>>>>> Did this start right after the upgrade?
>>>>>>>
>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>> can provide would help!
>>>>>>>
>>>>>>> Thanks-
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>
>>>>>>>>>   object_snaps out;
>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>   if (r < 0)
>>>>>>>>>     return r;
>>>>>>>>>
>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>
>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>> can tolerate it...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>
>>>>>>>>>> HI Sage,
>>>>>>>>>>
>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>
>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>
>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>  6:
>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>
>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>
>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>
>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>> 697]
>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Greets,
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Feb. 2, 2018, 9:05 p.m. UTC | #30
On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
> Am 02.02.2018 um 20:28 schrieb Sage Weil:
> > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
> >> Hi Sage,
> >>
> >> some days ago i reverted to stock luminous with
> >> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
> >> +
> >> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
> >> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
> >> found.
> >>
> >> this worked fine until the ceph balancer runned and started remapping
> >> some pgs.
> >>
> >> This again trigged this assert:
> >>
> >> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
> >> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
> >> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>
> >> Any idea how to fix it?
> > 
> > Cheap workaround is to comment out that assert.  I'm still tryign to sort 
> 
> Yes i did but i feel unconforable with it as the snapmapper value is
> still wrong.
> 
> > out how this happened, though.  Not sure if you shared this already, but:
> > - bluestore or filestore?
> 
> As i first reported it only filestore. Now also some bluestore OSDs.
> 
> > - no cache tiering, right?  nowhere on the cluster?
> Yes and Yes. Never used cache tiering at all.
> 
> > It looks like the SnapMapper entry with an empty snaps vector came in via 
> > add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
> > doesn't have that check).  There are a few place it could come from, 
> > including the on_local_recover() call and the update_snap_map() call.  
> > That last one is interesting because it pulls the snaps for a new clone 
> > out of the pg log entry.. maybe those aren't getting populated properly.
> 
> Is it also possible that they got corrupted in the past and now ceph is
> unable to detect it and repair it with a scrub or deep scrub?
> 
> > Can you confirm that all of the OSDs on the cluster are running the
> > latest?  (ceph versions).
> 
> Yes they do - i always check this after each update and i rechecked it
> again.

Ok thanks!

I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
has those commits above plus another one that adds debug messages in the 
various places that the empty snap vectors might be coming from.

This won't help it get past the rebalance issue you hit above, but 
hopefully it will trigger when you create a new clone so we can see how 
this is happening.

If we need to, I think we can ignore the assert (by commenting it out), 
but one downside there is that we'll see my warnings pop up when 
recovering those objects so we won't be able to tell where they really 
came from. Since you were seeing this on new snaps/clones, though, I'm 
hoping we won't end up in that situation!

Thanks!
sage



 > 
> Stefan
> 
> 
> >>
> >> Greets,
> >> Stefan
> >>
> >> Am 22.01.2018 um 20:01 schrieb Sage Weil:
> >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
> >>>> telling me:
> >>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
> >>>> on pg 3.33e oid
> >>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
> >>>> mapper: , oi: b0d7c...repaired
> >>>>
> >>>> Currently it seems it does an endless repair and never ends.
> >>>
> >>> On many objects or the same object?
> >>>
> >>> sage
> >>>
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
> >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi Sage,
> >>>>>>
> >>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
> >>>>>> it's OK to skip the assert.
> >>>>>>
> >>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
> >>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
> >>>>>> needs to get killed to proceed.
> >>>>>
> >>>>> I believe this will fix it:
> >>>>>
> >>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
> >>>>>
> >>>>> Cherry-pick that and let me know?
> >>>>>
> >>>>> Thanks!
> >>>>> sage
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> The log output is always:
> >>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> >>>>>> log [WRN] : slow request 141.951820 seconds old, received at
> >>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> >>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> >>>>>> 00000157:head v 927921'526559757) currently commit_sent
> >>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> >>>>>> log [WRN] : slow request 141.643853 seconds old, received at
> >>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> >>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> >>>>>> 00000147d:head v 927921'29105383) currently commit_sent
> >>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> >>>>>> log [WRN] : slow request 141.313220 seconds old, received at
> >>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> >>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> >>>>>> 000000aff:head v 927921'164552063) currently commit_sent
> >>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> >>>>>> (Aborted) **
> >>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> >>>>>>
> >>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
> >>>>>>  2: (()+0xf890) [0x7f100b974890]
> >>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
> >>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> >>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
> >>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> >>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
> >>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
> >>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
> >>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
> >>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
> >>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>> [0x561d87138bc7]
> >>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>> const&)+0x57) [0x561d873b0db7]
> >>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
> >>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>> [0x561d876f42bd]
> >>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
> >>>>>>  15: (()+0x8064) [0x7f100b96d064]
> >>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
> >>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>> needed to interpret this.
> >>>>>>
> >>>>>> --- logging levels ---
> >>>>>>    0/ 5 none
> >>>>>>    0/ 0 lockdep
> >>>>>>    0/ 0 context
> >>>>>>    0/ 0 crush
> >>>>>>    0/ 0 mds
> >>>>>>    0/ 0 mds_balancer
> >>>>>>    0/ 0 mds_locker
> >>>>>>    0/ 0 mds_log
> >>>>>>    0/ 0 mds_log_expire
> >>>>>>    0/ 0 mds_migrator
> >>>>>>    0/ 0 buffer
> >>>>>>    0/ 0 timer
> >>>>>>    0/ 0 filer
> >>>>>>    0/ 1 striper
> >>>>>>    0/ 0 objecter
> >>>>>>    0/ 0 rados
> >>>>>>    0/ 0 rbd
> >>>>>>    0/ 5 rbd_mirror
> >>>>>>    0/ 5 rbd_replay
> >>>>>>    0/ 0 journaler
> >>>>>>    0/ 0 objectcacher
> >>>>>>    0/ 0 client
> >>>>>>    0/ 0 osd
> >>>>>>    0/ 0 optracker
> >>>>>>    0/ 0 objclass
> >>>>>>    0/ 0 filestore
> >>>>>>    0/ 0 journal
> >>>>>>    0/ 0 ms
> >>>>>>    0/ 0 mon
> >>>>>>    0/ 0 monc
> >>>>>>    0/ 0 paxos
> >>>>>>    0/ 0 tp
> >>>>>>    0/ 0 auth
> >>>>>>    1/ 5 crypto
> >>>>>>    0/ 0 finisher
> >>>>>>    1/ 1 reserver
> >>>>>>    0/ 0 heartbeatmap
> >>>>>>    0/ 0 perfcounter
> >>>>>>    0/ 0 rgw
> >>>>>>    1/10 civetweb
> >>>>>>    1/ 5 javaclient
> >>>>>>    0/ 0 asok
> >>>>>>    0/ 0 throttle
> >>>>>>    0/ 0 refs
> >>>>>>    1/ 5 xio
> >>>>>>    1/ 5 compressor
> >>>>>>    1/ 5 bluestore
> >>>>>>    1/ 5 bluefs
> >>>>>>    1/ 3 bdev
> >>>>>>    1/ 5 kstore
> >>>>>>    4/ 5 rocksdb
> >>>>>>    4/ 5 leveldb
> >>>>>>    4/ 5 memdb
> >>>>>>    1/ 5 kinetic
> >>>>>>    1/ 5 fuse
> >>>>>>    1/ 5 mgr
> >>>>>>    1/ 5 mgrc
> >>>>>>    1/ 5 dpdk
> >>>>>>    1/ 5 eventtrace
> >>>>>>   -2/-2 (syslog threshold)
> >>>>>>   -1/-1 (stderr threshold)
> >>>>>>   max_recent     10000
> >>>>>>   max_new         1000
> >>>>>>   log_file /var/log/ceph/ceph-osd.59.log
> >>>>>> --- end dump of recent events ---
> >>>>>>
> >>>>>>
> >>>>>> Greets,
> >>>>>> Stefan
> >>>>>>
> >>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Hi Sage,
> >>>>>>>>
> >>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
> >>>>>>>> the following and i hope you might have an idea:
> >>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
> >>>>>>>> running luminous? How can they be missing in snapmapper again?
> >>>>>>>
> >>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> >>>>>>> that when you hit the crash we have some context as to what is going on.
> >>>>>>>
> >>>>>>> Did you track down which part of get_snaps() is returning the error?
> >>>>>>>
> >>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
> >>>>>>>> luminous when all this happened.
> >>>>>>>
> >>>>>>> Did this start right after the upgrade?
> >>>>>>>
> >>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
> >>>>>>> instead (unless the debug option is enabled, for our regression testing), 
> >>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
> >>>>>>> can provide would help!
> >>>>>>>
> >>>>>>> Thanks-
> >>>>>>> sage
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Greets,
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>>>>>>>> I'm guessing it is coming from
> >>>>>>>>>
> >>>>>>>>>   object_snaps out;
> >>>>>>>>>   int r = get_snaps(oid, &out);
> >>>>>>>>>   if (r < 0)
> >>>>>>>>>     return r;
> >>>>>>>>>
> >>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>>>>>>>> confirm that, and possibly also add debug lines to the various return 
> >>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>>>>>>>
> >>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
> >>>>>>>>> can tolerate it...
> >>>>>>>>>
> >>>>>>>>> sage
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>
> >>>>>>>>>> HI Sage,
> >>>>>>>>>>
> >>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
> >>>>>>>>>>
> >>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>>>>>>>
> >>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>>>>>>>
> >>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>> const*)+0x102) [0x55c0100d0372]
> >>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>> [0x55c00fdf6642]
> >>>>>>>>>>  6:
> >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>> [0x55c00fdfc684]
> >>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>> [0x55c00fd22030]
> >>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>> [0x55c00fb1abc7]
> >>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>> [0x55c0100d5ffd]
> >>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>> needed to interpret this.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>>>>>>>   const hobject_t &oid,
> >>>>>>>>>>>>>>   object_snaps *out)
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>   assert(check(oid));
> >>>>>>>>>>>>>>   set<string> keys;
> >>>>>>>>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>>>>     return r;
> >>>>>>>>>>>>>>   if (got.empty())
> >>>>>>>>>>>>>>     return -ENOENT;
> >>>>>>>>>>>>>>   if (out) {
> >>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>>>>>>>     ::decode(*out, bp);
> >>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>>>>>>>   } else {
> >>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>>>>>>>   }
> >>>>>>>>>>>>>>   return 0;
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> is it save to comment that assert?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>>>>>>>> can't repair.
> >>>>>>>>>>>>
> >>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>>>>>>>> SnapMapper::get_snaps(const h
> >>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>>>>>>>> 2018-01-18 13:00:54.840396
> >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>>>>>>>
> >>>>>>>>>>> I think if you switch that assert to a warning it will repair...
> >>>>>>>>>>>
> >>>>>>>>>>> sage
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>>>>>>>> 697]
> >>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>> [0x5561cdcd7bc7]
> >>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>> [0x5561ce292e7d]
> >>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Greets,
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>>
> >>>>>>>>>>>>> sage
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>>>>>>>> around the current assert:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>>>>>>>> +    } else {
> >>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>> +    }
> >>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>>>>>>>        recovery_info.soid,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> s
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>>>>>>>  5:
> >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 2, 2018, 9:54 p.m. UTC | #31
Am 02.02.2018 um 22:05 schrieb Sage Weil:
> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> some days ago i reverted to stock luminous with
>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>>>> +
>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>>>> found.
>>>>
>>>> this worked fine until the ceph balancer runned and started remapping
>>>> some pgs.
>>>>
>>>> This again trigged this assert:
>>>>
>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>
>>>> Any idea how to fix it?
>>>
>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
>>
>> Yes i did but i feel unconforable with it as the snapmapper value is
>> still wrong.
>>
>>> out how this happened, though.  Not sure if you shared this already, but:
>>> - bluestore or filestore?
>>
>> As i first reported it only filestore. Now also some bluestore OSDs.
>>
>>> - no cache tiering, right?  nowhere on the cluster?
>> Yes and Yes. Never used cache tiering at all.
>>
>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
>>> doesn't have that check).  There are a few place it could come from, 
>>> including the on_local_recover() call and the update_snap_map() call.  
>>> That last one is interesting because it pulls the snaps for a new clone 
>>> out of the pg log entry.. maybe those aren't getting populated properly.
>>
>> Is it also possible that they got corrupted in the past and now ceph is
>> unable to detect it and repair it with a scrub or deep scrub?
>>
>>> Can you confirm that all of the OSDs on the cluster are running the
>>> latest?  (ceph versions).
>>
>> Yes they do - i always check this after each update and i rechecked it
>> again.
> 
> Ok thanks!
> 
> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
> has those commits above plus another one that adds debug messages in the 
> various places that the empty snap vectors might be coming from.
> 
> This won't help it get past the rebalance issue you hit above, but 
> hopefully it will trigger when you create a new clone so we can see how 
> this is happening.
> 
> If we need to, I think we can ignore the assert (by commenting it out), 
> but one downside there is that we'll see my warnings pop up when 
> recovering those objects so we won't be able to tell where they really 
> came from. Since you were seeing this on new snaps/clones, though, I'm 
> hoping we won't end up in that situation!


compiling - it's late here. Will update tomorrow. Thanks again!

> Thanks!
> sage
> 
> 
> 
>  > 
>> Stefan
>>
>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>> telling me:
>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>> on pg 3.33e oid
>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>
>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>
>>>>> On many objects or the same object?
>>>>>
>>>>> sage
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>> it's OK to skip the assert.
>>>>>>>>
>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>> needs to get killed to proceed.
>>>>>>>
>>>>>>> I believe this will fix it:
>>>>>>>
>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>
>>>>>>> Cherry-pick that and let me know?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The log output is always:
>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>> (Aborted) **
>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561d87138bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561d876f42bd]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> --- logging levels ---
>>>>>>>>    0/ 5 none
>>>>>>>>    0/ 0 lockdep
>>>>>>>>    0/ 0 context
>>>>>>>>    0/ 0 crush
>>>>>>>>    0/ 0 mds
>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>    0/ 0 mds_locker
>>>>>>>>    0/ 0 mds_log
>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>    0/ 0 buffer
>>>>>>>>    0/ 0 timer
>>>>>>>>    0/ 0 filer
>>>>>>>>    0/ 1 striper
>>>>>>>>    0/ 0 objecter
>>>>>>>>    0/ 0 rados
>>>>>>>>    0/ 0 rbd
>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>    0/ 0 journaler
>>>>>>>>    0/ 0 objectcacher
>>>>>>>>    0/ 0 client
>>>>>>>>    0/ 0 osd
>>>>>>>>    0/ 0 optracker
>>>>>>>>    0/ 0 objclass
>>>>>>>>    0/ 0 filestore
>>>>>>>>    0/ 0 journal
>>>>>>>>    0/ 0 ms
>>>>>>>>    0/ 0 mon
>>>>>>>>    0/ 0 monc
>>>>>>>>    0/ 0 paxos
>>>>>>>>    0/ 0 tp
>>>>>>>>    0/ 0 auth
>>>>>>>>    1/ 5 crypto
>>>>>>>>    0/ 0 finisher
>>>>>>>>    1/ 1 reserver
>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>    0/ 0 perfcounter
>>>>>>>>    0/ 0 rgw
>>>>>>>>    1/10 civetweb
>>>>>>>>    1/ 5 javaclient
>>>>>>>>    0/ 0 asok
>>>>>>>>    0/ 0 throttle
>>>>>>>>    0/ 0 refs
>>>>>>>>    1/ 5 xio
>>>>>>>>    1/ 5 compressor
>>>>>>>>    1/ 5 bluestore
>>>>>>>>    1/ 5 bluefs
>>>>>>>>    1/ 3 bdev
>>>>>>>>    1/ 5 kstore
>>>>>>>>    4/ 5 rocksdb
>>>>>>>>    4/ 5 leveldb
>>>>>>>>    4/ 5 memdb
>>>>>>>>    1/ 5 kinetic
>>>>>>>>    1/ 5 fuse
>>>>>>>>    1/ 5 mgr
>>>>>>>>    1/ 5 mgrc
>>>>>>>>    1/ 5 dpdk
>>>>>>>>    1/ 5 eventtrace
>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>   max_recent     10000
>>>>>>>>   max_new         1000
>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>> --- end dump of recent events ---
>>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>
>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>
>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>
>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>> luminous when all this happened.
>>>>>>>>>
>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>
>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>> can provide would help!
>>>>>>>>>
>>>>>>>>> Thanks-
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>
>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>     return r;
>>>>>>>>>>>
>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>
>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>
>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>  6:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 3, 2018, 9:07 p.m. UTC | #32
Hi Sage,

Am 02.02.2018 um 22:05 schrieb Sage Weil:
> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> some days ago i reverted to stock luminous with
>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>>>> +
>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>>>> found.
>>>>
>>>> this worked fine until the ceph balancer runned and started remapping
>>>> some pgs.
>>>>
>>>> This again trigged this assert:
>>>>
>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>
>>>> Any idea how to fix it?
>>>
>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
>>
>> Yes i did but i feel unconforable with it as the snapmapper value is
>> still wrong.
>>
>>> out how this happened, though.  Not sure if you shared this already, but:
>>> - bluestore or filestore?
>>
>> As i first reported it only filestore. Now also some bluestore OSDs.
>>
>>> - no cache tiering, right?  nowhere on the cluster?
>> Yes and Yes. Never used cache tiering at all.
>>
>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
>>> doesn't have that check).  There are a few place it could come from, 
>>> including the on_local_recover() call and the update_snap_map() call.  
>>> That last one is interesting because it pulls the snaps for a new clone 
>>> out of the pg log entry.. maybe those aren't getting populated properly.
>>
>> Is it also possible that they got corrupted in the past and now ceph is
>> unable to detect it and repair it with a scrub or deep scrub?
>>
>>> Can you confirm that all of the OSDs on the cluster are running the
>>> latest?  (ceph versions).
>>
>> Yes they do - i always check this after each update and i rechecked it
>> again.
> 
> Ok thanks!
> 
> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
> has those commits above plus another one that adds debug messages in the 
> various places that the empty snap vectors might be coming from.
> 
> This won't help it get past the rebalance issue you hit above, but 
> hopefully it will trigger when you create a new clone so we can see how 
> this is happening.
> 
> If we need to, I think we can ignore the assert (by commenting it out), 
> but one downside there is that we'll see my warnings pop up when 
> recovering those objects so we won't be able to tell where they really 
> came from. Since you were seeing this on new snaps/clones, though, I'm 
> hoping we won't end up in that situation!

thanks. New version is running. Is looking through ceph.log enough? Or
are those only printed to the corresponding osd log?

Greets,
Stefan
> 
> Thanks!
> sage
> 
> 
> 
>  > 
>> Stefan
>>
>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>> telling me:
>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>> on pg 3.33e oid
>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>
>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>
>>>>> On many objects or the same object?
>>>>>
>>>>> sage
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>> it's OK to skip the assert.
>>>>>>>>
>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>> needs to get killed to proceed.
>>>>>>>
>>>>>>> I believe this will fix it:
>>>>>>>
>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>
>>>>>>> Cherry-pick that and let me know?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The log output is always:
>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>> (Aborted) **
>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561d87138bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561d876f42bd]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> --- logging levels ---
>>>>>>>>    0/ 5 none
>>>>>>>>    0/ 0 lockdep
>>>>>>>>    0/ 0 context
>>>>>>>>    0/ 0 crush
>>>>>>>>    0/ 0 mds
>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>    0/ 0 mds_locker
>>>>>>>>    0/ 0 mds_log
>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>    0/ 0 buffer
>>>>>>>>    0/ 0 timer
>>>>>>>>    0/ 0 filer
>>>>>>>>    0/ 1 striper
>>>>>>>>    0/ 0 objecter
>>>>>>>>    0/ 0 rados
>>>>>>>>    0/ 0 rbd
>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>    0/ 0 journaler
>>>>>>>>    0/ 0 objectcacher
>>>>>>>>    0/ 0 client
>>>>>>>>    0/ 0 osd
>>>>>>>>    0/ 0 optracker
>>>>>>>>    0/ 0 objclass
>>>>>>>>    0/ 0 filestore
>>>>>>>>    0/ 0 journal
>>>>>>>>    0/ 0 ms
>>>>>>>>    0/ 0 mon
>>>>>>>>    0/ 0 monc
>>>>>>>>    0/ 0 paxos
>>>>>>>>    0/ 0 tp
>>>>>>>>    0/ 0 auth
>>>>>>>>    1/ 5 crypto
>>>>>>>>    0/ 0 finisher
>>>>>>>>    1/ 1 reserver
>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>    0/ 0 perfcounter
>>>>>>>>    0/ 0 rgw
>>>>>>>>    1/10 civetweb
>>>>>>>>    1/ 5 javaclient
>>>>>>>>    0/ 0 asok
>>>>>>>>    0/ 0 throttle
>>>>>>>>    0/ 0 refs
>>>>>>>>    1/ 5 xio
>>>>>>>>    1/ 5 compressor
>>>>>>>>    1/ 5 bluestore
>>>>>>>>    1/ 5 bluefs
>>>>>>>>    1/ 3 bdev
>>>>>>>>    1/ 5 kstore
>>>>>>>>    4/ 5 rocksdb
>>>>>>>>    4/ 5 leveldb
>>>>>>>>    4/ 5 memdb
>>>>>>>>    1/ 5 kinetic
>>>>>>>>    1/ 5 fuse
>>>>>>>>    1/ 5 mgr
>>>>>>>>    1/ 5 mgrc
>>>>>>>>    1/ 5 dpdk
>>>>>>>>    1/ 5 eventtrace
>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>   max_recent     10000
>>>>>>>>   max_new         1000
>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>> --- end dump of recent events ---
>>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>
>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>
>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>
>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>> luminous when all this happened.
>>>>>>>>>
>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>
>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>> can provide would help!
>>>>>>>>>
>>>>>>>>> Thanks-
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>
>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>     return r;
>>>>>>>>>>>
>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>
>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>
>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>  6:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 5, 2018, 7:34 a.m. UTC | #33
Hi Sage,

Am 02.02.2018 um 22:05 schrieb Sage Weil:
> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> some days ago i reverted to stock luminous with
>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>>>> +
>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>>>> found.
>>>>
>>>> this worked fine until the ceph balancer runned and started remapping
>>>> some pgs.
>>>>
>>>> This again trigged this assert:
>>>>
>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>
>>>> Any idea how to fix it?
>>>
>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
>>
>> Yes i did but i feel unconforable with it as the snapmapper value is
>> still wrong.
>>
>>> out how this happened, though.  Not sure if you shared this already, but:
>>> - bluestore or filestore?
>>
>> As i first reported it only filestore. Now also some bluestore OSDs.
>>
>>> - no cache tiering, right?  nowhere on the cluster?
>> Yes and Yes. Never used cache tiering at all.
>>
>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
>>> doesn't have that check).  There are a few place it could come from, 
>>> including the on_local_recover() call and the update_snap_map() call.  
>>> That last one is interesting because it pulls the snaps for a new clone 
>>> out of the pg log entry.. maybe those aren't getting populated properly.
>>
>> Is it also possible that they got corrupted in the past and now ceph is
>> unable to detect it and repair it with a scrub or deep scrub?
>>
>>> Can you confirm that all of the OSDs on the cluster are running the
>>> latest?  (ceph versions).
>>
>> Yes they do - i always check this after each update and i rechecked it
>> again.
> 
> Ok thanks!
> 
> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
> has those commits above plus another one that adds debug messages in the 
> various places that the empty snap vectors might be coming from.
> 
> This won't help it get past the rebalance issue you hit above, but 
> hopefully it will trigger when you create a new clone so we can see how 
> this is happening.
> 
> If we need to, I think we can ignore the assert (by commenting it out), 
> but one downside there is that we'll see my warnings pop up when 
> recovering those objects so we won't be able to tell where they really 
> came from. Since you were seeing this on new snaps/clones, though, I'm 
> hoping we won't end up in that situation!

the biggest part of the logs is now this stuff:
2018-02-05 07:56:32.712366 7fb6893ff700  0 log_channel(cluster) log
[DBG] : 3.740 scrub ok
2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191
pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848]
local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no
head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77
(have MIN)
2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191
pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901]
local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no
head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f
(have MIN)
2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no
head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55
(have MIN)
2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77
(have MIN)
2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5
(have MIN)
2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e
(have MIN)
2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900]
local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps
no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77
(have MIN)
2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970]
local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps
no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3
(have MIN)
2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043]
local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps
no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42
(have MIN)
2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243]
local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps
no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7
(have MIN)

Stefan

> Thanks!
> sage
> 
> 
> 
>  > 
>> Stefan
>>
>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>> telling me:
>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>> on pg 3.33e oid
>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>
>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>
>>>>> On many objects or the same object?
>>>>>
>>>>> sage
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>> it's OK to skip the assert.
>>>>>>>>
>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>> needs to get killed to proceed.
>>>>>>>
>>>>>>> I believe this will fix it:
>>>>>>>
>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>
>>>>>>> Cherry-pick that and let me know?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The log output is always:
>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>> (Aborted) **
>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561d87138bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561d876f42bd]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> --- logging levels ---
>>>>>>>>    0/ 5 none
>>>>>>>>    0/ 0 lockdep
>>>>>>>>    0/ 0 context
>>>>>>>>    0/ 0 crush
>>>>>>>>    0/ 0 mds
>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>    0/ 0 mds_locker
>>>>>>>>    0/ 0 mds_log
>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>    0/ 0 buffer
>>>>>>>>    0/ 0 timer
>>>>>>>>    0/ 0 filer
>>>>>>>>    0/ 1 striper
>>>>>>>>    0/ 0 objecter
>>>>>>>>    0/ 0 rados
>>>>>>>>    0/ 0 rbd
>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>    0/ 0 journaler
>>>>>>>>    0/ 0 objectcacher
>>>>>>>>    0/ 0 client
>>>>>>>>    0/ 0 osd
>>>>>>>>    0/ 0 optracker
>>>>>>>>    0/ 0 objclass
>>>>>>>>    0/ 0 filestore
>>>>>>>>    0/ 0 journal
>>>>>>>>    0/ 0 ms
>>>>>>>>    0/ 0 mon
>>>>>>>>    0/ 0 monc
>>>>>>>>    0/ 0 paxos
>>>>>>>>    0/ 0 tp
>>>>>>>>    0/ 0 auth
>>>>>>>>    1/ 5 crypto
>>>>>>>>    0/ 0 finisher
>>>>>>>>    1/ 1 reserver
>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>    0/ 0 perfcounter
>>>>>>>>    0/ 0 rgw
>>>>>>>>    1/10 civetweb
>>>>>>>>    1/ 5 javaclient
>>>>>>>>    0/ 0 asok
>>>>>>>>    0/ 0 throttle
>>>>>>>>    0/ 0 refs
>>>>>>>>    1/ 5 xio
>>>>>>>>    1/ 5 compressor
>>>>>>>>    1/ 5 bluestore
>>>>>>>>    1/ 5 bluefs
>>>>>>>>    1/ 3 bdev
>>>>>>>>    1/ 5 kstore
>>>>>>>>    4/ 5 rocksdb
>>>>>>>>    4/ 5 leveldb
>>>>>>>>    4/ 5 memdb
>>>>>>>>    1/ 5 kinetic
>>>>>>>>    1/ 5 fuse
>>>>>>>>    1/ 5 mgr
>>>>>>>>    1/ 5 mgrc
>>>>>>>>    1/ 5 dpdk
>>>>>>>>    1/ 5 eventtrace
>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>   max_recent     10000
>>>>>>>>   max_new         1000
>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>> --- end dump of recent events ---
>>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>
>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>
>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>
>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>> luminous when all this happened.
>>>>>>>>>
>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>
>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>> can provide would help!
>>>>>>>>>
>>>>>>>>> Thanks-
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>
>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>     return r;
>>>>>>>>>>>
>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>
>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>
>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>  6:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Feb. 5, 2018, 12:27 p.m. UTC | #34
On Sat, 3 Feb 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> Am 02.02.2018 um 22:05 schrieb Sage Weil:
> > On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
> >> Am 02.02.2018 um 20:28 schrieb Sage Weil:
> >>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
> >>>> Hi Sage,
> >>>>
> >>>> some days ago i reverted to stock luminous with
> >>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
> >>>> +
> >>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
> >>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
> >>>> found.
> >>>>
> >>>> this worked fine until the ceph balancer runned and started remapping
> >>>> some pgs.
> >>>>
> >>>> This again trigged this assert:
> >>>>
> >>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
> >>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
> >>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>
> >>>> Any idea how to fix it?
> >>>
> >>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
> >>
> >> Yes i did but i feel unconforable with it as the snapmapper value is
> >> still wrong.
> >>
> >>> out how this happened, though.  Not sure if you shared this already, but:
> >>> - bluestore or filestore?
> >>
> >> As i first reported it only filestore. Now also some bluestore OSDs.
> >>
> >>> - no cache tiering, right?  nowhere on the cluster?
> >> Yes and Yes. Never used cache tiering at all.
> >>
> >>> It looks like the SnapMapper entry with an empty snaps vector came in via 
> >>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
> >>> doesn't have that check).  There are a few place it could come from, 
> >>> including the on_local_recover() call and the update_snap_map() call.  
> >>> That last one is interesting because it pulls the snaps for a new clone 
> >>> out of the pg log entry.. maybe those aren't getting populated properly.
> >>
> >> Is it also possible that they got corrupted in the past and now ceph is
> >> unable to detect it and repair it with a scrub or deep scrub?
> >>
> >>> Can you confirm that all of the OSDs on the cluster are running the
> >>> latest?  (ceph versions).
> >>
> >> Yes they do - i always check this after each update and i rechecked it
> >> again.
> > 
> > Ok thanks!
> > 
> > I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
> > has those commits above plus another one that adds debug messages in the 
> > various places that the empty snap vectors might be coming from.
> > 
> > This won't help it get past the rebalance issue you hit above, but 
> > hopefully it will trigger when you create a new clone so we can see how 
> > this is happening.
> > 
> > If we need to, I think we can ignore the assert (by commenting it out), 
> > but one downside there is that we'll see my warnings pop up when 
> > recovering those objects so we won't be able to tell where they really 
> > came from. Since you were seeing this on new snaps/clones, though, I'm 
> > hoping we won't end up in that situation!
> 
> thanks. New version is running. Is looking through ceph.log enough? Or
> are those only printed to the corresponding osd log?

They only go to the OSD log.

sage

> 
> Greets,
> Stefan
> > 
> > Thanks!
> > sage
> > 
> > 
> > 
> >  > 
> >> Stefan
> >>
> >>
> >>>>
> >>>> Greets,
> >>>> Stefan
> >>>>
> >>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
> >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hi Sage,
> >>>>>>
> >>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
> >>>>>> telling me:
> >>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
> >>>>>> on pg 3.33e oid
> >>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
> >>>>>> mapper: , oi: b0d7c...repaired
> >>>>>>
> >>>>>> Currently it seems it does an endless repair and never ends.
> >>>>>
> >>>>> On many objects or the same object?
> >>>>>
> >>>>> sage
> >>>>>
> >>>>>>
> >>>>>> Greets,
> >>>>>> Stefan
> >>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
> >>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Hi Sage,
> >>>>>>>>
> >>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
> >>>>>>>> it's OK to skip the assert.
> >>>>>>>>
> >>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
> >>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
> >>>>>>>> needs to get killed to proceed.
> >>>>>>>
> >>>>>>> I believe this will fix it:
> >>>>>>>
> >>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
> >>>>>>>
> >>>>>>> Cherry-pick that and let me know?
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> sage
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> The log output is always:
> >>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> >>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
> >>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> >>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> >>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
> >>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> >>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
> >>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> >>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> >>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
> >>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> >>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
> >>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> >>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> >>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
> >>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> >>>>>>>> (Aborted) **
> >>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> >>>>>>>>
> >>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
> >>>>>>>>  2: (()+0xf890) [0x7f100b974890]
> >>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
> >>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> >>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
> >>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> >>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
> >>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
> >>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
> >>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
> >>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
> >>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>> [0x561d87138bc7]
> >>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>> const&)+0x57) [0x561d873b0db7]
> >>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
> >>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>> [0x561d876f42bd]
> >>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
> >>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
> >>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
> >>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>> needed to interpret this.
> >>>>>>>>
> >>>>>>>> --- logging levels ---
> >>>>>>>>    0/ 5 none
> >>>>>>>>    0/ 0 lockdep
> >>>>>>>>    0/ 0 context
> >>>>>>>>    0/ 0 crush
> >>>>>>>>    0/ 0 mds
> >>>>>>>>    0/ 0 mds_balancer
> >>>>>>>>    0/ 0 mds_locker
> >>>>>>>>    0/ 0 mds_log
> >>>>>>>>    0/ 0 mds_log_expire
> >>>>>>>>    0/ 0 mds_migrator
> >>>>>>>>    0/ 0 buffer
> >>>>>>>>    0/ 0 timer
> >>>>>>>>    0/ 0 filer
> >>>>>>>>    0/ 1 striper
> >>>>>>>>    0/ 0 objecter
> >>>>>>>>    0/ 0 rados
> >>>>>>>>    0/ 0 rbd
> >>>>>>>>    0/ 5 rbd_mirror
> >>>>>>>>    0/ 5 rbd_replay
> >>>>>>>>    0/ 0 journaler
> >>>>>>>>    0/ 0 objectcacher
> >>>>>>>>    0/ 0 client
> >>>>>>>>    0/ 0 osd
> >>>>>>>>    0/ 0 optracker
> >>>>>>>>    0/ 0 objclass
> >>>>>>>>    0/ 0 filestore
> >>>>>>>>    0/ 0 journal
> >>>>>>>>    0/ 0 ms
> >>>>>>>>    0/ 0 mon
> >>>>>>>>    0/ 0 monc
> >>>>>>>>    0/ 0 paxos
> >>>>>>>>    0/ 0 tp
> >>>>>>>>    0/ 0 auth
> >>>>>>>>    1/ 5 crypto
> >>>>>>>>    0/ 0 finisher
> >>>>>>>>    1/ 1 reserver
> >>>>>>>>    0/ 0 heartbeatmap
> >>>>>>>>    0/ 0 perfcounter
> >>>>>>>>    0/ 0 rgw
> >>>>>>>>    1/10 civetweb
> >>>>>>>>    1/ 5 javaclient
> >>>>>>>>    0/ 0 asok
> >>>>>>>>    0/ 0 throttle
> >>>>>>>>    0/ 0 refs
> >>>>>>>>    1/ 5 xio
> >>>>>>>>    1/ 5 compressor
> >>>>>>>>    1/ 5 bluestore
> >>>>>>>>    1/ 5 bluefs
> >>>>>>>>    1/ 3 bdev
> >>>>>>>>    1/ 5 kstore
> >>>>>>>>    4/ 5 rocksdb
> >>>>>>>>    4/ 5 leveldb
> >>>>>>>>    4/ 5 memdb
> >>>>>>>>    1/ 5 kinetic
> >>>>>>>>    1/ 5 fuse
> >>>>>>>>    1/ 5 mgr
> >>>>>>>>    1/ 5 mgrc
> >>>>>>>>    1/ 5 dpdk
> >>>>>>>>    1/ 5 eventtrace
> >>>>>>>>   -2/-2 (syslog threshold)
> >>>>>>>>   -1/-1 (stderr threshold)
> >>>>>>>>   max_recent     10000
> >>>>>>>>   max_new         1000
> >>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
> >>>>>>>> --- end dump of recent events ---
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Greets,
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>> Hi Sage,
> >>>>>>>>>>
> >>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
> >>>>>>>>>> the following and i hope you might have an idea:
> >>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
> >>>>>>>>>> running luminous? How can they be missing in snapmapper again?
> >>>>>>>>>
> >>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> >>>>>>>>> that when you hit the crash we have some context as to what is going on.
> >>>>>>>>>
> >>>>>>>>> Did you track down which part of get_snaps() is returning the error?
> >>>>>>>>>
> >>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
> >>>>>>>>>> luminous when all this happened.
> >>>>>>>>>
> >>>>>>>>> Did this start right after the upgrade?
> >>>>>>>>>
> >>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
> >>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
> >>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
> >>>>>>>>> can provide would help!
> >>>>>>>>>
> >>>>>>>>> Thanks-
> >>>>>>>>> sage
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Greets,
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>>>>>>>>>> I'm guessing it is coming from
> >>>>>>>>>>>
> >>>>>>>>>>>   object_snaps out;
> >>>>>>>>>>>   int r = get_snaps(oid, &out);
> >>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>     return r;
> >>>>>>>>>>>
> >>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
> >>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
> >>>>>>>>>>> can tolerate it...
> >>>>>>>>>>>
> >>>>>>>>>>> sage
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> HI Sage,
> >>>>>>>>>>>>
> >>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
> >>>>>>>>>>>>
> >>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>>>>>>>>>
> >>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>>>>>>>>>
> >>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
> >>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>> [0x55c00fdf6642]
> >>>>>>>>>>>>  6:
> >>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>> [0x55c00fdfc684]
> >>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>> [0x55c00fd22030]
> >>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>> [0x55c00fb1abc7]
> >>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>> [0x55c0100d5ffd]
> >>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>>>>>>>>>   const hobject_t &oid,
> >>>>>>>>>>>>>>>>   object_snaps *out)
> >>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>   assert(check(oid));
> >>>>>>>>>>>>>>>>   set<string> keys;
> >>>>>>>>>>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>>>>>>     return r;
> >>>>>>>>>>>>>>>>   if (got.empty())
> >>>>>>>>>>>>>>>>     return -ENOENT;
> >>>>>>>>>>>>>>>>   if (out) {
> >>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>>>>>>>>>     ::decode(*out, bp);
> >>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>>>>>>>>>   } else {
> >>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>>>>>>>>>   }
> >>>>>>>>>>>>>>>>   return 0;
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> is it save to comment that assert?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>>>>>>>>>> can't repair.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>>>>>>>>>> SnapMapper::get_snaps(const h
> >>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
> >>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> sage
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>>>>>>>>>> 697]
> >>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>> [0x5561cdcd7bc7]
> >>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>> [0x5561ce292e7d]
> >>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>>>>>>>>>> around the current assert:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>>>>>>>>>> +    } else {
> >>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>>>> +    }
> >>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>>>>>>>>>        recovery_info.soid,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> s
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>>>>>>>>>  5:
> >>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 5, 2018, 1:39 p.m. UTC | #35
Hi Sage,

Am 02.02.2018 um 22:05 schrieb Sage Weil:
> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> some days ago i reverted to stock luminous with
>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>>>> +
>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>>>> found.
>>>>
>>>> this worked fine until the ceph balancer runned and started remapping
>>>> some pgs.
>>>>
>>>> This again trigged this assert:
>>>>
>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>
>>>> Any idea how to fix it?
>>>
>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
>>
>> Yes i did but i feel unconforable with it as the snapmapper value is
>> still wrong.
>>
>>> out how this happened, though.  Not sure if you shared this already, but:
>>> - bluestore or filestore?
>>
>> As i first reported it only filestore. Now also some bluestore OSDs.
>>
>>> - no cache tiering, right?  nowhere on the cluster?
>> Yes and Yes. Never used cache tiering at all.
>>
>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
>>> doesn't have that check).  There are a few place it could come from, 
>>> including the on_local_recover() call and the update_snap_map() call.  
>>> That last one is interesting because it pulls the snaps for a new clone 
>>> out of the pg log entry.. maybe those aren't getting populated properly.
>>
>> Is it also possible that they got corrupted in the past and now ceph is
>> unable to detect it and repair it with a scrub or deep scrub?
>>
>>> Can you confirm that all of the OSDs on the cluster are running the
>>> latest?  (ceph versions).
>>
>> Yes they do - i always check this after each update and i rechecked it
>> again.
> 
> Ok thanks!
> 
> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
> has those commits above plus another one that adds debug messages in the 
> various places that the empty snap vectors might be coming from.
> 
> This won't help it get past the rebalance issue you hit above, but 
> hopefully it will trigger when you create a new clone so we can see how 
> this is happening.
> 
> If we need to, I think we can ignore the assert (by commenting it out), 
> but one downside there is that we'll see my warnings pop up when 
> recovering those objects so we won't be able to tell where they really 
> came from. Since you were seeing this on new snaps/clones, though, I'm 
> hoping we won't end up in that situation!

the biggest part of the logs is now this stuff:
2018-02-05 07:56:32.712366 7fb6893ff700  0 log_channel(cluster) log
[DBG] : 3.740 scrub ok
2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191
pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848]
local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no
head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77
(have MIN)
2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191
pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901]
local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no
head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f
(have MIN)
2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no
head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55
(have MIN)
2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77
(have MIN)
2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5
(have MIN)
2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e
(have MIN)
2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900]
local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps
no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77
(have MIN)
2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970]
local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps
no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3
(have MIN)
2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043]
local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps
no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42
(have MIN)
2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191
pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243]
local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps
no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7
(have MIN)

Found nothing else in the looks until now. Should i search for more OSDs
or is this already something we should consider?

Stefan

> Thanks!
> sage
> 
> 
> 
>  > 
>> Stefan
>>
>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>> telling me:
>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>> on pg 3.33e oid
>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>
>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>
>>>>> On many objects or the same object?
>>>>>
>>>>> sage
>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>> it's OK to skip the assert.
>>>>>>>>
>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>> needs to get killed to proceed.
>>>>>>>
>>>>>>> I believe this will fix it:
>>>>>>>
>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>
>>>>>>> Cherry-pick that and let me know?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The log output is always:
>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>> (Aborted) **
>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561d87138bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561d876f42bd]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>>
>>>>>>>> --- logging levels ---
>>>>>>>>    0/ 5 none
>>>>>>>>    0/ 0 lockdep
>>>>>>>>    0/ 0 context
>>>>>>>>    0/ 0 crush
>>>>>>>>    0/ 0 mds
>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>    0/ 0 mds_locker
>>>>>>>>    0/ 0 mds_log
>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>    0/ 0 buffer
>>>>>>>>    0/ 0 timer
>>>>>>>>    0/ 0 filer
>>>>>>>>    0/ 1 striper
>>>>>>>>    0/ 0 objecter
>>>>>>>>    0/ 0 rados
>>>>>>>>    0/ 0 rbd
>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>    0/ 0 journaler
>>>>>>>>    0/ 0 objectcacher
>>>>>>>>    0/ 0 client
>>>>>>>>    0/ 0 osd
>>>>>>>>    0/ 0 optracker
>>>>>>>>    0/ 0 objclass
>>>>>>>>    0/ 0 filestore
>>>>>>>>    0/ 0 journal
>>>>>>>>    0/ 0 ms
>>>>>>>>    0/ 0 mon
>>>>>>>>    0/ 0 monc
>>>>>>>>    0/ 0 paxos
>>>>>>>>    0/ 0 tp
>>>>>>>>    0/ 0 auth
>>>>>>>>    1/ 5 crypto
>>>>>>>>    0/ 0 finisher
>>>>>>>>    1/ 1 reserver
>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>    0/ 0 perfcounter
>>>>>>>>    0/ 0 rgw
>>>>>>>>    1/10 civetweb
>>>>>>>>    1/ 5 javaclient
>>>>>>>>    0/ 0 asok
>>>>>>>>    0/ 0 throttle
>>>>>>>>    0/ 0 refs
>>>>>>>>    1/ 5 xio
>>>>>>>>    1/ 5 compressor
>>>>>>>>    1/ 5 bluestore
>>>>>>>>    1/ 5 bluefs
>>>>>>>>    1/ 3 bdev
>>>>>>>>    1/ 5 kstore
>>>>>>>>    4/ 5 rocksdb
>>>>>>>>    4/ 5 leveldb
>>>>>>>>    4/ 5 memdb
>>>>>>>>    1/ 5 kinetic
>>>>>>>>    1/ 5 fuse
>>>>>>>>    1/ 5 mgr
>>>>>>>>    1/ 5 mgrc
>>>>>>>>    1/ 5 dpdk
>>>>>>>>    1/ 5 eventtrace
>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>   max_recent     10000
>>>>>>>>   max_new         1000
>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>> --- end dump of recent events ---
>>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Hi Sage,
>>>>>>>>>>
>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>
>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>
>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>
>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>> luminous when all this happened.
>>>>>>>>>
>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>
>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>> can provide would help!
>>>>>>>>>
>>>>>>>>> Thanks-
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>
>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>     return r;
>>>>>>>>>>>
>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>
>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>
>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>  6:
>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 12, 2018, 11:58 a.m. UTC | #36
Hi Sage,

did you had any time to look at the logs?

Stefan

Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG:
> Hi Sage,
> 
> Am 02.02.2018 um 22:05 schrieb Sage Weil:
>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
>>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Sage,
>>>>>
>>>>> some days ago i reverted to stock luminous with
>>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>>>>> +
>>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>>>>> found.
>>>>>
>>>>> this worked fine until the ceph balancer runned and started remapping
>>>>> some pgs.
>>>>>
>>>>> This again trigged this assert:
>>>>>
>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>
>>>>> Any idea how to fix it?
>>>>
>>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
>>>
>>> Yes i did but i feel unconforable with it as the snapmapper value is
>>> still wrong.
>>>
>>>> out how this happened, though.  Not sure if you shared this already, but:
>>>> - bluestore or filestore?
>>>
>>> As i first reported it only filestore. Now also some bluestore OSDs.
>>>
>>>> - no cache tiering, right?  nowhere on the cluster?
>>> Yes and Yes. Never used cache tiering at all.
>>>
>>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
>>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
>>>> doesn't have that check).  There are a few place it could come from, 
>>>> including the on_local_recover() call and the update_snap_map() call.  
>>>> That last one is interesting because it pulls the snaps for a new clone 
>>>> out of the pg log entry.. maybe those aren't getting populated properly.
>>>
>>> Is it also possible that they got corrupted in the past and now ceph is
>>> unable to detect it and repair it with a scrub or deep scrub?
>>>
>>>> Can you confirm that all of the OSDs on the cluster are running the
>>>> latest?  (ceph versions).
>>>
>>> Yes they do - i always check this after each update and i rechecked it
>>> again.
>>
>> Ok thanks!
>>
>> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
>> has those commits above plus another one that adds debug messages in the 
>> various places that the empty snap vectors might be coming from.
>>
>> This won't help it get past the rebalance issue you hit above, but 
>> hopefully it will trigger when you create a new clone so we can see how 
>> this is happening.
>>
>> If we need to, I think we can ignore the assert (by commenting it out), 
>> but one downside there is that we'll see my warnings pop up when 
>> recovering those objects so we won't be able to tell where they really 
>> came from. Since you were seeing this on new snaps/clones, though, I'm 
>> hoping we won't end up in that situation!
> 
> the biggest part of the logs is now this stuff:
> 2018-02-05 07:56:32.712366 7fb6893ff700  0 log_channel(cluster) log
> [DBG] : 3.740 scrub ok
> 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191
> pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848]
> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
> luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no
> head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77
> (have MIN)
> 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191
> pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901]
> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
> luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no
> head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f
> (have MIN)
> 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546]
> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no
> head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55
> (have MIN)
> 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
> head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77
> (have MIN)
> 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
> head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5
> (have MIN)
> 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
> head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e
> (have MIN)
> 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900]
> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps
> no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77
> (have MIN)
> 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191
> pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970]
> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps
> no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3
> (have MIN)
> 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191
> pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043]
> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps
> no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42
> (have MIN)
> 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243]
> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps
> no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7
> (have MIN)
> 
> Found nothing else in the looks until now. Should i search for more OSDs
> or is this already something we should consider?
> 
> Stefan
> 
>> Thanks!
>> sage
>>
>>
>>
>>  > 
>>> Stefan
>>>
>>>
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>>
>>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi Sage,
>>>>>>>
>>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>>> telling me:
>>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>>> on pg 3.33e oid
>>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>>
>>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>>
>>>>>> On many objects or the same object?
>>>>>>
>>>>>> sage
>>>>>>
>>>>>>>
>>>>>>> Greets,
>>>>>>> Stefan
>>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> Hi Sage,
>>>>>>>>>
>>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>>> it's OK to skip the assert.
>>>>>>>>>
>>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>>> needs to get killed to proceed.
>>>>>>>>
>>>>>>>> I believe this will fix it:
>>>>>>>>
>>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>>
>>>>>>>> Cherry-pick that and let me know?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> sage
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The log output is always:
>>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>>> (Aborted) **
>>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>>
>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>> [0x561d87138bc7]
>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>> [0x561d876f42bd]
>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>> needed to interpret this.
>>>>>>>>>
>>>>>>>>> --- logging levels ---
>>>>>>>>>    0/ 5 none
>>>>>>>>>    0/ 0 lockdep
>>>>>>>>>    0/ 0 context
>>>>>>>>>    0/ 0 crush
>>>>>>>>>    0/ 0 mds
>>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>>    0/ 0 mds_locker
>>>>>>>>>    0/ 0 mds_log
>>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>>    0/ 0 buffer
>>>>>>>>>    0/ 0 timer
>>>>>>>>>    0/ 0 filer
>>>>>>>>>    0/ 1 striper
>>>>>>>>>    0/ 0 objecter
>>>>>>>>>    0/ 0 rados
>>>>>>>>>    0/ 0 rbd
>>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>>    0/ 0 journaler
>>>>>>>>>    0/ 0 objectcacher
>>>>>>>>>    0/ 0 client
>>>>>>>>>    0/ 0 osd
>>>>>>>>>    0/ 0 optracker
>>>>>>>>>    0/ 0 objclass
>>>>>>>>>    0/ 0 filestore
>>>>>>>>>    0/ 0 journal
>>>>>>>>>    0/ 0 ms
>>>>>>>>>    0/ 0 mon
>>>>>>>>>    0/ 0 monc
>>>>>>>>>    0/ 0 paxos
>>>>>>>>>    0/ 0 tp
>>>>>>>>>    0/ 0 auth
>>>>>>>>>    1/ 5 crypto
>>>>>>>>>    0/ 0 finisher
>>>>>>>>>    1/ 1 reserver
>>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>>    0/ 0 perfcounter
>>>>>>>>>    0/ 0 rgw
>>>>>>>>>    1/10 civetweb
>>>>>>>>>    1/ 5 javaclient
>>>>>>>>>    0/ 0 asok
>>>>>>>>>    0/ 0 throttle
>>>>>>>>>    0/ 0 refs
>>>>>>>>>    1/ 5 xio
>>>>>>>>>    1/ 5 compressor
>>>>>>>>>    1/ 5 bluestore
>>>>>>>>>    1/ 5 bluefs
>>>>>>>>>    1/ 3 bdev
>>>>>>>>>    1/ 5 kstore
>>>>>>>>>    4/ 5 rocksdb
>>>>>>>>>    4/ 5 leveldb
>>>>>>>>>    4/ 5 memdb
>>>>>>>>>    1/ 5 kinetic
>>>>>>>>>    1/ 5 fuse
>>>>>>>>>    1/ 5 mgr
>>>>>>>>>    1/ 5 mgrc
>>>>>>>>>    1/ 5 dpdk
>>>>>>>>>    1/ 5 eventtrace
>>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>>   max_recent     10000
>>>>>>>>>   max_new         1000
>>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>>> --- end dump of recent events ---
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Greets,
>>>>>>>>> Stefan
>>>>>>>>>
>>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>
>>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>>
>>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>>
>>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>>
>>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>>> luminous when all this happened.
>>>>>>>>>>
>>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>>
>>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>>> can provide would help!
>>>>>>>>>>
>>>>>>>>>> Thanks-
>>>>>>>>>> sage
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Greets,
>>>>>>>>>>> Stefan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>>
>>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>     return r;
>>>>>>>>>>>>
>>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>>
>>>>>>>>>>>> sage
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>>
>>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>>
>>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>>
>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>>  6:
>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Feb. 12, 2018, 7:31 p.m. UTC | #37
On Mon, 12 Feb 2018, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> did you had any time to look at the logs?

Sorry, I thought I replied earlier.  See below

> Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG:
> > Hi Sage,
> > 
> > Am 02.02.2018 um 22:05 schrieb Sage Weil:
> >> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
> >>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
> >>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
> >>>>> Hi Sage,
> >>>>>
> >>>>> some days ago i reverted to stock luminous with
> >>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
> >>>>> +
> >>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
> >>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
> >>>>> found.
> >>>>>
> >>>>> this worked fine until the ceph balancer runned and started remapping
> >>>>> some pgs.
> >>>>>
> >>>>> This again trigged this assert:
> >>>>>
> >>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
> >>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
> >>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>
> >>>>> Any idea how to fix it?
> >>>>
> >>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
> >>>
> >>> Yes i did but i feel unconforable with it as the snapmapper value is
> >>> still wrong.
> >>>
> >>>> out how this happened, though.  Not sure if you shared this already, but:
> >>>> - bluestore or filestore?
> >>>
> >>> As i first reported it only filestore. Now also some bluestore OSDs.
> >>>
> >>>> - no cache tiering, right?  nowhere on the cluster?
> >>> Yes and Yes. Never used cache tiering at all.
> >>>
> >>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
> >>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
> >>>> doesn't have that check).  There are a few place it could come from, 
> >>>> including the on_local_recover() call and the update_snap_map() call.  
> >>>> That last one is interesting because it pulls the snaps for a new clone 
> >>>> out of the pg log entry.. maybe those aren't getting populated properly.
> >>>
> >>> Is it also possible that they got corrupted in the past and now ceph is
> >>> unable to detect it and repair it with a scrub or deep scrub?
> >>>
> >>>> Can you confirm that all of the OSDs on the cluster are running the
> >>>> latest?  (ceph versions).
> >>>
> >>> Yes they do - i always check this after each update and i rechecked it
> >>> again.
> >>
> >> Ok thanks!
> >>
> >> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
> >> has those commits above plus another one that adds debug messages in the 
> >> various places that the empty snap vectors might be coming from.
> >>
> >> This won't help it get past the rebalance issue you hit above, but 
> >> hopefully it will trigger when you create a new clone so we can see how 
> >> this is happening.
> >>
> >> If we need to, I think we can ignore the assert (by commenting it out), 
> >> but one downside there is that we'll see my warnings pop up when 
> >> recovering those objects so we won't be able to tell where they really 
> >> came from. Since you were seeing this on new snaps/clones, though, I'm 
> >> hoping we won't end up in that situation!
> > 
> > the biggest part of the logs is now this stuff:
> > 2018-02-05 07:56:32.712366 7fb6893ff700  0 log_channel(cluster) log
> > [DBG] : 3.740 scrub ok
> > 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191
> > pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848]
> > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
> > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
> > luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no
> > head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77
> > (have MIN)
> > 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191
> > pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901]
> > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
> > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
> > luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no
> > head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f
> > (have MIN)
> > 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> > pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546]
> > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> > luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no
> > head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55
> > (have MIN)
> > 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
> > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
> > head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77
> > (have MIN)
> > 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
> > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
> > head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5
> > (have MIN)
> > 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
> > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
> > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
> > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
> > head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e
> > (have MIN)
> > 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> > pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900]
> > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
> > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> > luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps
> > no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77
> > (have MIN)
> > 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191
> > pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970]
> > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
> > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> > luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps
> > no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3
> > (have MIN)
> > 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191
> > pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043]
> > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
> > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> > luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps
> > no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42
> > (have MIN)
> > 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191
> > pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243]
> > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
> > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
> > luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps
> > no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7
> > (have MIN)
> > 
> > Found nothing else in the looks until now. Should i search for more OSDs
> > or is this already something we should consider?

Yeah, those messages aren't super informative.  What would be useful is to 
check for any of those (or, more importantly the other new messages that 
branch adds) on *new* object clones for snapshots that have just been 
created.  (Earlier, you mentioned that the errors you were seeing referred 
to snapshots that had just been created, right?  The key question is how 
those new clones are getting registered in snapmapper with an empty list 
of snaps.)

Thanks!
sage



> > Stefan
> > 
> >> Thanks!
> >> sage
> >>
> >>
> >>
> >>  > 
> >>> Stefan
> >>>
> >>>
> >>>>>
> >>>>> Greets,
> >>>>> Stefan
> >>>>>
> >>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
> >>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>> Hi Sage,
> >>>>>>>
> >>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
> >>>>>>> telling me:
> >>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
> >>>>>>> on pg 3.33e oid
> >>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
> >>>>>>> mapper: , oi: b0d7c...repaired
> >>>>>>>
> >>>>>>> Currently it seems it does an endless repair and never ends.
> >>>>>>
> >>>>>> On many objects or the same object?
> >>>>>>
> >>>>>> sage
> >>>>>>
> >>>>>>>
> >>>>>>> Greets,
> >>>>>>> Stefan
> >>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
> >>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>> Hi Sage,
> >>>>>>>>>
> >>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
> >>>>>>>>> it's OK to skip the assert.
> >>>>>>>>>
> >>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
> >>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
> >>>>>>>>> needs to get killed to proceed.
> >>>>>>>>
> >>>>>>>> I believe this will fix it:
> >>>>>>>>
> >>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
> >>>>>>>>
> >>>>>>>> Cherry-pick that and let me know?
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> sage
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The log output is always:
> >>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
> >>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
> >>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
> >>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
> >>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
> >>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
> >>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
> >>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
> >>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
> >>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
> >>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
> >>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
> >>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
> >>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
> >>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
> >>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
> >>>>>>>>> (Aborted) **
> >>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
> >>>>>>>>>
> >>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
> >>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
> >>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
> >>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
> >>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
> >>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
> >>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
> >>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
> >>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
> >>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
> >>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
> >>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>> [0x561d87138bc7]
> >>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>> const&)+0x57) [0x561d873b0db7]
> >>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
> >>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>> [0x561d876f42bd]
> >>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
> >>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
> >>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
> >>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>> needed to interpret this.
> >>>>>>>>>
> >>>>>>>>> --- logging levels ---
> >>>>>>>>>    0/ 5 none
> >>>>>>>>>    0/ 0 lockdep
> >>>>>>>>>    0/ 0 context
> >>>>>>>>>    0/ 0 crush
> >>>>>>>>>    0/ 0 mds
> >>>>>>>>>    0/ 0 mds_balancer
> >>>>>>>>>    0/ 0 mds_locker
> >>>>>>>>>    0/ 0 mds_log
> >>>>>>>>>    0/ 0 mds_log_expire
> >>>>>>>>>    0/ 0 mds_migrator
> >>>>>>>>>    0/ 0 buffer
> >>>>>>>>>    0/ 0 timer
> >>>>>>>>>    0/ 0 filer
> >>>>>>>>>    0/ 1 striper
> >>>>>>>>>    0/ 0 objecter
> >>>>>>>>>    0/ 0 rados
> >>>>>>>>>    0/ 0 rbd
> >>>>>>>>>    0/ 5 rbd_mirror
> >>>>>>>>>    0/ 5 rbd_replay
> >>>>>>>>>    0/ 0 journaler
> >>>>>>>>>    0/ 0 objectcacher
> >>>>>>>>>    0/ 0 client
> >>>>>>>>>    0/ 0 osd
> >>>>>>>>>    0/ 0 optracker
> >>>>>>>>>    0/ 0 objclass
> >>>>>>>>>    0/ 0 filestore
> >>>>>>>>>    0/ 0 journal
> >>>>>>>>>    0/ 0 ms
> >>>>>>>>>    0/ 0 mon
> >>>>>>>>>    0/ 0 monc
> >>>>>>>>>    0/ 0 paxos
> >>>>>>>>>    0/ 0 tp
> >>>>>>>>>    0/ 0 auth
> >>>>>>>>>    1/ 5 crypto
> >>>>>>>>>    0/ 0 finisher
> >>>>>>>>>    1/ 1 reserver
> >>>>>>>>>    0/ 0 heartbeatmap
> >>>>>>>>>    0/ 0 perfcounter
> >>>>>>>>>    0/ 0 rgw
> >>>>>>>>>    1/10 civetweb
> >>>>>>>>>    1/ 5 javaclient
> >>>>>>>>>    0/ 0 asok
> >>>>>>>>>    0/ 0 throttle
> >>>>>>>>>    0/ 0 refs
> >>>>>>>>>    1/ 5 xio
> >>>>>>>>>    1/ 5 compressor
> >>>>>>>>>    1/ 5 bluestore
> >>>>>>>>>    1/ 5 bluefs
> >>>>>>>>>    1/ 3 bdev
> >>>>>>>>>    1/ 5 kstore
> >>>>>>>>>    4/ 5 rocksdb
> >>>>>>>>>    4/ 5 leveldb
> >>>>>>>>>    4/ 5 memdb
> >>>>>>>>>    1/ 5 kinetic
> >>>>>>>>>    1/ 5 fuse
> >>>>>>>>>    1/ 5 mgr
> >>>>>>>>>    1/ 5 mgrc
> >>>>>>>>>    1/ 5 dpdk
> >>>>>>>>>    1/ 5 eventtrace
> >>>>>>>>>   -2/-2 (syslog threshold)
> >>>>>>>>>   -1/-1 (stderr threshold)
> >>>>>>>>>   max_recent     10000
> >>>>>>>>>   max_new         1000
> >>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
> >>>>>>>>> --- end dump of recent events ---
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Greets,
> >>>>>>>>> Stefan
> >>>>>>>>>
> >>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
> >>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>
> >>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
> >>>>>>>>>>> the following and i hope you might have an idea:
> >>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
> >>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
> >>>>>>>>>>
> >>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
> >>>>>>>>>> that when you hit the crash we have some context as to what is going on.
> >>>>>>>>>>
> >>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
> >>>>>>>>>>
> >>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
> >>>>>>>>>>> luminous when all this happened.
> >>>>>>>>>>
> >>>>>>>>>> Did this start right after the upgrade?
> >>>>>>>>>>
> >>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
> >>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
> >>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
> >>>>>>>>>> can provide would help!
> >>>>>>>>>>
> >>>>>>>>>> Thanks-
> >>>>>>>>>> sage
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Greets,
> >>>>>>>>>>> Stefan
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
> >>>>>>>>>>>> I'm guessing it is coming from
> >>>>>>>>>>>>
> >>>>>>>>>>>>   object_snaps out;
> >>>>>>>>>>>>   int r = get_snaps(oid, &out);
> >>>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>>     return r;
> >>>>>>>>>>>>
> >>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
> >>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
> >>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
> >>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
> >>>>>>>>>>>> can tolerate it...
> >>>>>>>>>>>>
> >>>>>>>>>>>> sage
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> HI Sage,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
> >>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
> >>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
> >>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
> >>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
> >>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
> >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
> >>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
> >>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
> >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
> >>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
> >>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
> >>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
> >>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>> [0x55c00fdf6642]
> >>>>>>>>>>>>>  6:
> >>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>> [0x55c00fdfc684]
> >>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>> [0x55c00fd22030]
> >>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
> >>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>> [0x55c00fb1abc7]
> >>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
> >>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
> >>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>> [0x55c0100d5ffd]
> >>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
> >>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
> >>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
> >>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
> >>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
> >>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
> >>>>>>>>>>>>>>>>>   const hobject_t &oid,
> >>>>>>>>>>>>>>>>>   object_snaps *out)
> >>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>   assert(check(oid));
> >>>>>>>>>>>>>>>>>   set<string> keys;
> >>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
> >>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
> >>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
> >>>>>>>>>>>>>>>>>   if (r < 0)
> >>>>>>>>>>>>>>>>>     return r;
> >>>>>>>>>>>>>>>>>   if (got.empty())
> >>>>>>>>>>>>>>>>>     return -ENOENT;
> >>>>>>>>>>>>>>>>>   if (out) {
> >>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
> >>>>>>>>>>>>>>>>>     ::decode(*out, bp);
> >>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
> >>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
> >>>>>>>>>>>>>>>>>   } else {
> >>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
> >>>>>>>>>>>>>>>>>   }
> >>>>>>>>>>>>>>>>>   return 0;
> >>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> is it save to comment that assert?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
> >>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
> >>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
> >>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
> >>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
> >>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
> >>>>>>>>>>>>>>>> can't repair.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
> >>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
> >>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
> >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
> >>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
> >>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
> >>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
> >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
> >>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
> >>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
> >>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
> >>>>>>>>>>>>>>> 697]
> >>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
> >>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> >>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
> >>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
> >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
> >>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
> >>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>> [0x5561cdcd7bc7]
> >>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
> >>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
> >>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>> [0x5561ce292e7d]
> >>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
> >>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
> >>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
> >>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>>>> Hi Sage,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
> >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
> >>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
> >>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
> >>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
> >>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
> >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> >>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
> >>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
> >>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
> >>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
> >>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
> >>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
> >>>>>>>>>>>>>>>>>> around the current assert:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
> >>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
> >>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
> >>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
> >>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
> >>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> >>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> >>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
> >>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> >>>>>>>>>>>>>>>>>> +    } else {
> >>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
> >>>>>>>>>>>>>>>>>> +    }
> >>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
> >>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
> >>>>>>>>>>>>>>>>>>        recovery_info.soid,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> s
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
> >>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
> >>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
> >>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
> >>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
> >>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
> >>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
> >>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
> >>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
> >>>>>>>>>>>>>>>>>>>>>  5:
> >>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
> >>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
> >>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
> >>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
> >>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
> >>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
> >>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
> >>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
> >>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
> >>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
> >>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
> >>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Greets,
> >>>>>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
> >>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
> >>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
> >>>>>>>>>>>>>>>>>>>>>> list mapping.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
> >>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
> >>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
> >>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
> >>>>>>>>>>>>>>>>>>>>>> happens.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
> >>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> sage
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>  > 
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Stefan
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
> >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
> >>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
> >>>>>>>>>>>>>>>>>>>>>>>>> being down.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
> >>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
> >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
> >>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
> >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
> >>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
> >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
> >>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
> >>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
> >>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
> >>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
> >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
> >>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
> >>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
> >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
> >>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
> >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
> >>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
> >>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
> >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
> >>>>>>>>>>>>>>>>>>>>>>>>>  7:
> >>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
> >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
> >>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
> >>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
> >>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
> >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
> >>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
> >>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
> >>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
> >>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
> >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
> >>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
> >>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
> >>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
> >>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
> >>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
> >>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
> >>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
> >>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
> >>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
> >>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
> >>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
> >>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
> >>>>>>>>>>>>>>>>>>>>>>>> them myself.
> >>>>>>>>>>>>>>>>>>>>>>>> -Greg
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Priebe - Profihost AG Feb. 12, 2018, 8:06 p.m. UTC | #38
Hi,

see below.

Am 12.02.2018 um 20:31 schrieb Sage Weil:
> On Mon, 12 Feb 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> did you had any time to look at the logs?
> 
> Sorry, I thought I replied earlier.  See below
> 
>> Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG:
>>> Hi Sage,
>>>
>>> Am 02.02.2018 um 22:05 schrieb Sage Weil:
>>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
>>>>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi Sage,
>>>>>>>
>>>>>>> some days ago i reverted to stock luminous with
>>>>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>>>>>>> +
>>>>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>>>>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>>>>>>> found.
>>>>>>>
>>>>>>> this worked fine until the ceph balancer runned and started remapping
>>>>>>> some pgs.
>>>>>>>
>>>>>>> This again trigged this assert:
>>>>>>>
>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>>>>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>
>>>>>>> Any idea how to fix it?
>>>>>>
>>>>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
>>>>>
>>>>> Yes i did but i feel unconforable with it as the snapmapper value is
>>>>> still wrong.
>>>>>
>>>>>> out how this happened, though.  Not sure if you shared this already, but:
>>>>>> - bluestore or filestore?
>>>>>
>>>>> As i first reported it only filestore. Now also some bluestore OSDs.
>>>>>
>>>>>> - no cache tiering, right?  nowhere on the cluster?
>>>>> Yes and Yes. Never used cache tiering at all.
>>>>>
>>>>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
>>>>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
>>>>>> doesn't have that check).  There are a few place it could come from, 
>>>>>> including the on_local_recover() call and the update_snap_map() call.  
>>>>>> That last one is interesting because it pulls the snaps for a new clone 
>>>>>> out of the pg log entry.. maybe those aren't getting populated properly.
>>>>>
>>>>> Is it also possible that they got corrupted in the past and now ceph is
>>>>> unable to detect it and repair it with a scrub or deep scrub?
>>>>>
>>>>>> Can you confirm that all of the OSDs on the cluster are running the
>>>>>> latest?  (ceph versions).
>>>>>
>>>>> Yes they do - i always check this after each update and i rechecked it
>>>>> again.
>>>>
>>>> Ok thanks!
>>>>
>>>> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
>>>> has those commits above plus another one that adds debug messages in the 
>>>> various places that the empty snap vectors might be coming from.
>>>>
>>>> This won't help it get past the rebalance issue you hit above, but 
>>>> hopefully it will trigger when you create a new clone so we can see how 
>>>> this is happening.
>>>>
>>>> If we need to, I think we can ignore the assert (by commenting it out), 
>>>> but one downside there is that we'll see my warnings pop up when 
>>>> recovering those objects so we won't be able to tell where they really 
>>>> came from. Since you were seeing this on new snaps/clones, though, I'm 
>>>> hoping we won't end up in that situation!
>>>
>>> the biggest part of the logs is now this stuff:
>>> 2018-02-05 07:56:32.712366 7fb6893ff700  0 log_channel(cluster) log
>>> [DBG] : 3.740 scrub ok
>>> 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848]
>>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
>>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
>>> luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no
>>> head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77
>>> (have MIN)
>>> 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191
>>> pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901]
>>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
>>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
>>> luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no
>>> head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f
>>> (have MIN)
>>> 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no
>>> head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55
>>> (have MIN)
>>> 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
>>> head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77
>>> (have MIN)
>>> 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
>>> head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5
>>> (have MIN)
>>> 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
>>> head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e
>>> (have MIN)
>>> 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900]
>>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps
>>> no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77
>>> (have MIN)
>>> 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970]
>>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps
>>> no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3
>>> (have MIN)
>>> 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043]
>>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps
>>> no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42
>>> (have MIN)
>>> 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243]
>>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps
>>> no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7
>>> (have MIN)
>>>
>>> Found nothing else in the looks until now. Should i search for more OSDs
>>> or is this already something we should consider?
> 
> Yeah, those messages aren't super informative.  What would be useful is to 
> check for any of those (or, more importantly the other new messages that 
> branch adds) on *new* object clones for snapshots that have just been 
> created.  (Earlier, you mentioned that the errors you were seeing referred 
> to snapshots that had just been created, right?  The key question is how 
> those new clones are getting registered in snapmapper with an empty list 
> of snaps.)

Yes i checked this twice - but i cannot find anything in the logs now.

I checked the logs with:
grep 'snap' /var/log/ceph/ceph-osd.[0-9]*.log | grep -v ' _scan_snaps no
head for '

and the output is just:
/var/log/ceph/ceph-osd.2.log:2018-02-12 07:44:29.169600 7f9f6abff700 -1
osd.2 pg_epoch: 949525 pg[3.7ae( v 949525'24972392
(949524'24970891,949525'24972392] local-lis/les=946216/946218 n=1608
ec=15/15 lis/c 946216/946216 les/c/f 946218/946223/937357
946216/946216/945695) [2,26,43] r=0 lpr=946216 luod=949524'24972391
lua=949524'24972391 crt=949525'24972392 lcod 949524'24972390 mlcod
949524'24972390 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head

/var/log/ceph/ceph-osd.39.log:2018-02-12 07:44:28.829474 7fbaa5bff700 -1
osd.39 pg_epoch: 949525 pg[3.864( v 949525'60058835
(949524'60057284,949525'60058835] local-lis/les=944678/944679 n=1621
ec=15/15 lis/c 944678/944678 les/c/f 944679/944684/937357
944678/944678/944522) [39,80,31] r=0 lpr=944678 crt=949525'60058835 lcod
949524'60058834 mlcod 949524'60058834 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.19.log:2018-02-12 07:44:28.661329 7f5f9bfec700 -1
osd.19 pg_epoch: 949525 pg[3.d78( v 949525'32475960
(949495'32474451,949525'32475960] local-lis/les=944690/944691 n=1497
ec=15/15 lis/c 944690/944690 les/c/f 944691/944697/937357
944690/944690/944542) [19,75,78] r=0 lpr=944690 luod=949525'32475958
lua=949525'32475958 crt=949525'32475960 lcod 949524'32475957 mlcod
949524'32475957 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head

/var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:28.899062 7f0282fff700 -1
osd.33 pg_epoch: 949525 pg[3.298( v 949525'82343947
(949524'82342349,949525'82343947] local-lis/les=944603/944604 n=1654
ec=15/15 lis/c 944603/944603 les/c/f 944604/944612/937357
944603/944603/944603) [33,13,18] r=0 lpr=944603 crt=949525'82343947 lcod
949524'82343946 mlcod 949524'82343946 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head
/var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:29.313090 7f0281fff700 -1
osd.33 pg_epoch: 949525 pg[3.2c1( v 949525'58513621
(949524'58512073,949525'58513621] local-lis/les=944678/944679 n=1660
ec=15/15 lis/c 944678/944678 les/c/f 944679/944687/937357
944678/944678/944603) [33,80,56] r=0 lpr=944678 luod=949525'58513619
lua=949525'58513619 crt=949525'58513621 lcod 949525'58513618 mlcod
949525'58513618 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head
/var/log/ceph/ceph-osd.45.log:2018-02-12 07:44:28.916617 7fea14ffd700 -1
osd.45 pg_epoch: 949525 pg[3.bf9( v 949525'71145615
(949524'71144094,949525'71145615] local-lis/les=944620/944621 n=1639
ec=15/15 lis/c 944620/944620 les/c/f 944621/944624/937357
944620/944620/944608) [45,28,58] r=0 lpr=944620 crt=949525'71145615 lcod
949525'71145614 mlcod 949525'71145614 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.696833 7f13073fe700 -1
osd.62 pg_epoch: 949525 pg[3.3e8( v 949525'30577716
(949524'30576140,949525'30577716] local-lis/les=946273/946274 n=1609
ec=15/15 lis/c 946273/946273 les/c/f 946274/946277/937357
946273/946273/944634) [62,80,37] r=0 lpr=946273 luod=949524'30577712
lua=949524'30577712 crt=949525'30577716 lcod 949524'30577711 mlcod
949524'30577711 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head
/var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.801300 7f13073fe700 -1
osd.62 pg_epoch: 949525 pg[3.5d0( v 949525'90315972
(949524'90314417,949525'90315972] local-lis/les=944677/944678 n=1631
ec=15/15 lis/c 944677/944677 les/c/f 944678/944684/937357
944677/944677/944634) [62,58,88] r=0 lpr=944677 crt=949525'90315972 lcod
949524'90315971 mlcod 949524'90315971 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.81.log:2018-02-12 07:44:29.465068 7f9f9c7ff700 -1
osd.81 pg_epoch: 949525 pg[3.3cc( v 949525'46127568
(949524'46125993,949525'46127568] local-lis/les=944676/944677 n=1547
ec=15/15 lis/c 944676/944676 les/c/f 944677/944684/937357
944676/944676/944676) [81,59,66] r=0 lpr=944676 crt=949525'46127568 lcod
949525'46127567 mlcod 949525'46127567 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.91.log:2018-02-12 07:44:29.258414 7f6eebffd700 -1
osd.91 pg_epoch: 949525 pg[3.efd( v 949525'24155822
(949524'24154262,949525'24155822] local-lis/les=945695/945696 n=1645
ec=15/15 lis/c 945695/945695 les/c/f 945696/945697/937357
945695/945695/944694) [91,2,41] r=0 lpr=945695 crt=949525'24155822 lcod
949525'24155821 mlcod 949525'24155821 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

Thanks!

Stefan

> 
> Thanks!
> sage
> 
> 
> 
>>> Stefan
>>>
>>>> Thanks!
>>>> sage
>>>>
>>>>
>>>>
>>>>  > 
>>>>> Stefan
>>>>>
>>>>>
>>>>>>>
>>>>>>> Greets,
>>>>>>> Stefan
>>>>>>>
>>>>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> Hi Sage,
>>>>>>>>>
>>>>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>>>>> telling me:
>>>>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>>>>> on pg 3.33e oid
>>>>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>>>>
>>>>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>>>>
>>>>>>>> On many objects or the same object?
>>>>>>>>
>>>>>>>> sage
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Greets,
>>>>>>>>> Stefan
>>>>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>
>>>>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>>>>> it's OK to skip the assert.
>>>>>>>>>>>
>>>>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>>>>> needs to get killed to proceed.
>>>>>>>>>>
>>>>>>>>>> I believe this will fix it:
>>>>>>>>>>
>>>>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>>>>
>>>>>>>>>> Cherry-pick that and let me know?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> sage
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The log output is always:
>>>>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>>>>> (Aborted) **
>>>>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>>>>
>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>> [0x561d87138bc7]
>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>> [0x561d876f42bd]
>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>
>>>>>>>>>>> --- logging levels ---
>>>>>>>>>>>    0/ 5 none
>>>>>>>>>>>    0/ 0 lockdep
>>>>>>>>>>>    0/ 0 context
>>>>>>>>>>>    0/ 0 crush
>>>>>>>>>>>    0/ 0 mds
>>>>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>>>>    0/ 0 mds_locker
>>>>>>>>>>>    0/ 0 mds_log
>>>>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>>>>    0/ 0 buffer
>>>>>>>>>>>    0/ 0 timer
>>>>>>>>>>>    0/ 0 filer
>>>>>>>>>>>    0/ 1 striper
>>>>>>>>>>>    0/ 0 objecter
>>>>>>>>>>>    0/ 0 rados
>>>>>>>>>>>    0/ 0 rbd
>>>>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>>>>    0/ 0 journaler
>>>>>>>>>>>    0/ 0 objectcacher
>>>>>>>>>>>    0/ 0 client
>>>>>>>>>>>    0/ 0 osd
>>>>>>>>>>>    0/ 0 optracker
>>>>>>>>>>>    0/ 0 objclass
>>>>>>>>>>>    0/ 0 filestore
>>>>>>>>>>>    0/ 0 journal
>>>>>>>>>>>    0/ 0 ms
>>>>>>>>>>>    0/ 0 mon
>>>>>>>>>>>    0/ 0 monc
>>>>>>>>>>>    0/ 0 paxos
>>>>>>>>>>>    0/ 0 tp
>>>>>>>>>>>    0/ 0 auth
>>>>>>>>>>>    1/ 5 crypto
>>>>>>>>>>>    0/ 0 finisher
>>>>>>>>>>>    1/ 1 reserver
>>>>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>>>>    0/ 0 perfcounter
>>>>>>>>>>>    0/ 0 rgw
>>>>>>>>>>>    1/10 civetweb
>>>>>>>>>>>    1/ 5 javaclient
>>>>>>>>>>>    0/ 0 asok
>>>>>>>>>>>    0/ 0 throttle
>>>>>>>>>>>    0/ 0 refs
>>>>>>>>>>>    1/ 5 xio
>>>>>>>>>>>    1/ 5 compressor
>>>>>>>>>>>    1/ 5 bluestore
>>>>>>>>>>>    1/ 5 bluefs
>>>>>>>>>>>    1/ 3 bdev
>>>>>>>>>>>    1/ 5 kstore
>>>>>>>>>>>    4/ 5 rocksdb
>>>>>>>>>>>    4/ 5 leveldb
>>>>>>>>>>>    4/ 5 memdb
>>>>>>>>>>>    1/ 5 kinetic
>>>>>>>>>>>    1/ 5 fuse
>>>>>>>>>>>    1/ 5 mgr
>>>>>>>>>>>    1/ 5 mgrc
>>>>>>>>>>>    1/ 5 dpdk
>>>>>>>>>>>    1/ 5 eventtrace
>>>>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>>>>   max_recent     10000
>>>>>>>>>>>   max_new         1000
>>>>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>>>>> --- end dump of recent events ---
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Greets,
>>>>>>>>>>> Stefan
>>>>>>>>>>>
>>>>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>>>>
>>>>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>>>>
>>>>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>>>>> luminous when all this happened.
>>>>>>>>>>>>
>>>>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>>>>
>>>>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>>>>> can provide would help!
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks-
>>>>>>>>>>>> sage
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>>>>  6:
>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@profihost.ag> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
index d42f3a401b..0f76134f74 100644
--- a/src/osd/PrimaryLogPG.cc
+++ b/src/osd/PrimaryLogPG.cc
@@ -372,8 +372,11 @@  void PrimaryLogPG::on_local_recover(
     set<snapid_t> snaps;
     dout(20) << " snapset " << recovery_info.ss << dendl;
     auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
-    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
-    snaps.insert(p->second.begin(), p->second.end());
+    if (p != recovery_info.ss.clone_snaps.end()) {
+      derr << __func__ << " no clone_snaps for " << hoid << dendl;
+    } else {
+      snaps.insert(p->second.begin(), p->second.end());
+    }
     dout(20) << " snaps " << snaps << dendl;
     snap_mapper.add_oid(
       recovery_info.soid,