Message ID | 06E7D85B3BA36C4DB207FEDE871C534891D8CB@SHSMSX101.ccr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 29 Jul 2014, Wang, Zhiqiang wrote: > This fixes a bug when the time of the OSDs and clients are not synchronized > (especially when client is ahead of OSD), and the cache tier dirty ratio > reaches the threshold, the agent skips the flush work because it thinks > the object is too young. > > The skipping flush code is as following: > > if (obc->obs.oi.mtime + utime_t(pool.info.cache_min_flush_age, 0) > now) { > dout(20) << __func__ << " skip (too young) " << obc->obs.oi << dendl; > osd->logger->inc(l_osd_agent_skip); > return false; > } Hmm, I think the use of the client mtime is an important property we want to maintain for the benefit of the rest of the system. Two other possibilities: 1) If cache_min_flush_age is 0, skip the check, so we avoid mtimes in the future in that case. 2) Add an additional field to object_info_t that is an mtime based on the OSD's clock, and use that for these checks. What do you think? sage > > Signed-off-by: Zhiqiang Wang <wonzhq@hotmail.com> > --- > src/osd/ReplicatedPG.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc > index 5d822c1..1c7c527 100644 > --- a/src/osd/ReplicatedPG.cc > +++ b/src/osd/ReplicatedPG.cc > @@ -1805,7 +1805,7 @@ void ReplicatedPG::execute_ctx(OpContext *ctx) > > // version > ctx->at_version = get_next_version(); > - ctx->mtime = m->get_mtime(); > + ctx->mtime = ceph_clock_now(cct); > > dout(10) << "do_op " << soid << " " << ctx->ops > << " ov " << obc->obs.oi.version << " av " << ctx->at_version > -- > 1.9.1 > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 29, 2014 at 11:46 AM, Sage Weil <sweil@redhat.com> wrote: > On Tue, 29 Jul 2014, Wang, Zhiqiang wrote: >> This fixes a bug when the time of the OSDs and clients are not synchronized >> (especially when client is ahead of OSD), and the cache tier dirty ratio >> reaches the threshold, the agent skips the flush work because it thinks >> the object is too young. >> >> The skipping flush code is as following: >> >> if (obc->obs.oi.mtime + utime_t(pool.info.cache_min_flush_age, 0) > now) { >> dout(20) << __func__ << " skip (too young) " << obc->obs.oi << dendl; >> osd->logger->inc(l_osd_agent_skip); >> return false; >> } > > Hmm, I think the use of the client mtime is an important property we want > to maintain for the benefit of the rest of the system. Two other > possibilities: This was my initial thought as well — but I don't think we're actually relying on that any more (e.g., we used to use the client times for file recovery in the MDS, but the rest of CephFS is now using MDS times anyway). Are there scenarios other than CephFS recovery? > > 1) If cache_min_flush_age is 0, skip the check, so we avoid mtimes in the > future in that case. > > 2) Add an additional field to object_info_t that is an mtime based on the > OSD's clock, and use that for these checks. But option 2 is probably still a good idea. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Off the top of my head... the fscache code (read only) in the CephFS kclient uses the mtime to see if the item is valid (there was not an object change version counter at the time). I don't know if / how that would be affected of the top of my head. - M On Tue, Jul 29, 2014 at 12:02 PM, Gregory Farnum <greg@inktank.com> wrote: > On Tue, Jul 29, 2014 at 11:46 AM, Sage Weil <sweil@redhat.com> wrote: >> On Tue, 29 Jul 2014, Wang, Zhiqiang wrote: >>> This fixes a bug when the time of the OSDs and clients are not synchronized >>> (especially when client is ahead of OSD), and the cache tier dirty ratio >>> reaches the threshold, the agent skips the flush work because it thinks >>> the object is too young. >>> >>> The skipping flush code is as following: >>> >>> if (obc->obs.oi.mtime + utime_t(pool.info.cache_min_flush_age, 0) > now) { >>> dout(20) << __func__ << " skip (too young) " << obc->obs.oi << dendl; >>> osd->logger->inc(l_osd_agent_skip); >>> return false; >>> } >> >> Hmm, I think the use of the client mtime is an important property we want >> to maintain for the benefit of the rest of the system. Two other >> possibilities: > > This was my initial thought as well — but I don't think we're actually > relying on that any more (e.g., we used to use the client times for > file recovery in the MDS, but the rest of CephFS is now using MDS > times anyway). Are there scenarios other than CephFS recovery? > >> >> 1) If cache_min_flush_age is 0, skip the check, so we avoid mtimes in the >> future in that case. >> >> 2) Add an additional field to object_info_t that is an mtime based on the >> OSD's clock, and use that for these checks. > > But option 2 is probably still a good idea. > -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
TG9va3MgbGlrZSB3ZSBkbyBuZWVkIHRvIGtlZXAgdGhlIGNsaWVudCB0aW1lIGFzIHRoZSBtdGlt ZS4gU2luY2Ugb3B0aW9uIDEgZG9lc24ndCByZWFsbHkgZml4IHRoZSBwcm9ibGVtLCBJIHdvdWxk IHZvdGUgZm9yIG9wdGlvbiAyLg0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTog Y2VwaC1kZXZlbC1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpjZXBoLWRldmVsLW93bmVy QHZnZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIE1pbG9zeiBUYW5za2kNClNlbnQ6IFdlZG5l c2RheSwgSnVseSAzMCwgMjAxNCAyOjU1IEFNDQpUbzogR3JlZ29yeSBGYXJudW0NCkNjOiBTYWdl IFdlaWw7IFdhbmcsIFpoaXFpYW5nOyBjZXBoLWRldmVsQHZnZXIua2VybmVsLm9yZw0KU3ViamVj dDogUmU6IFtQQVRDSF0gY2VwaDogdXNlIHRoZSBPU0QgdGltZSBhcyB0aGUgb2JqZWN0IG10aW1l IGluc3RlYWQgb2YgdGhlIGNsaWVudCB0aW1lDQoNCk9mZiB0aGUgdG9wIG9mIG15IGhlYWQuLi4g dGhlIGZzY2FjaGUgY29kZSAocmVhZCBvbmx5KSBpbiB0aGUgQ2VwaEZTIGtjbGllbnQgdXNlcyB0 aGUgbXRpbWUgdG8gc2VlIGlmIHRoZSBpdGVtIGlzIHZhbGlkICh0aGVyZSB3YXMgbm90IGFuIG9i amVjdCBjaGFuZ2UgdmVyc2lvbiBjb3VudGVyIGF0IHRoZSB0aW1lKS4gSSBkb24ndCBrbm93IGlm IC8gaG93IHRoYXQgd291bGQgYmUgYWZmZWN0ZWQgb2YgdGhlIHRvcCBvZiBteSBoZWFkLg0KDQot IE0NCg0KT24gVHVlLCBKdWwgMjksIDIwMTQgYXQgMTI6MDIgUE0sIEdyZWdvcnkgRmFybnVtIDxn cmVnQGlua3RhbmsuY29tPiB3cm90ZToNCj4gT24gVHVlLCBKdWwgMjksIDIwMTQgYXQgMTE6NDYg QU0sIFNhZ2UgV2VpbCA8c3dlaWxAcmVkaGF0LmNvbT4gd3JvdGU6DQo+PiBPbiBUdWUsIDI5IEp1 bCAyMDE0LCBXYW5nLCBaaGlxaWFuZyB3cm90ZToNCj4+PiBUaGlzIGZpeGVzIGEgYnVnIHdoZW4g dGhlIHRpbWUgb2YgdGhlIE9TRHMgYW5kIGNsaWVudHMgYXJlIG5vdCANCj4+PiBzeW5jaHJvbml6 ZWQgKGVzcGVjaWFsbHkgd2hlbiBjbGllbnQgaXMgYWhlYWQgb2YgT1NEKSwgYW5kIHRoZSBjYWNo ZSANCj4+PiB0aWVyIGRpcnR5IHJhdGlvIHJlYWNoZXMgdGhlIHRocmVzaG9sZCwgdGhlIGFnZW50 IHNraXBzIHRoZSBmbHVzaCANCj4+PiB3b3JrIGJlY2F1c2UgaXQgdGhpbmtzIHRoZSBvYmplY3Qg aXMgdG9vIHlvdW5nLg0KPj4+DQo+Pj4gVGhlIHNraXBwaW5nIGZsdXNoIGNvZGUgaXMgYXMgZm9s bG93aW5nOg0KPj4+DQo+Pj4gICBpZiAob2JjLT5vYnMub2kubXRpbWUgKyB1dGltZV90KHBvb2wu aW5mby5jYWNoZV9taW5fZmx1c2hfYWdlLCAwKSA+IG5vdykgew0KPj4+ICAgICBkb3V0KDIwKSA8 PCBfX2Z1bmNfXyA8PCAiIHNraXAgKHRvbyB5b3VuZykgIiA8PCBvYmMtPm9icy5vaSA8PCBkZW5k bDsNCj4+PiAgICAgb3NkLT5sb2dnZXItPmluYyhsX29zZF9hZ2VudF9za2lwKTsNCj4+PiAgICAg cmV0dXJuIGZhbHNlOw0KPj4+ICAgfQ0KPj4NCj4+IEhtbSwgSSB0aGluayB0aGUgdXNlIG9mIHRo ZSBjbGllbnQgbXRpbWUgaXMgYW4gaW1wb3J0YW50IHByb3BlcnR5IHdlIA0KPj4gd2FudCB0byBt YWludGFpbiBmb3IgdGhlIGJlbmVmaXQgb2YgdGhlIHJlc3Qgb2YgdGhlIHN5c3RlbS4gIFR3byAN Cj4+IG90aGVyDQo+PiBwb3NzaWJpbGl0aWVzOg0KPg0KPiBUaGlzIHdhcyBteSBpbml0aWFsIHRo b3VnaHQgYXMgd2VsbCDigJQgYnV0IEkgZG9uJ3QgdGhpbmsgd2UncmUgYWN0dWFsbHkgDQo+IHJl bHlpbmcgb24gdGhhdCBhbnkgbW9yZSAoZS5nLiwgd2UgdXNlZCB0byB1c2UgdGhlIGNsaWVudCB0 aW1lcyBmb3IgDQo+IGZpbGUgcmVjb3ZlcnkgaW4gdGhlIE1EUywgYnV0IHRoZSByZXN0IG9mIENl cGhGUyBpcyBub3cgdXNpbmcgTURTIA0KPiB0aW1lcyBhbnl3YXkpLiBBcmUgdGhlcmUgc2NlbmFy aW9zIG90aGVyIHRoYW4gQ2VwaEZTIHJlY292ZXJ5Pw0KPg0KPj4NCj4+IDEpIElmIGNhY2hlX21p bl9mbHVzaF9hZ2UgaXMgMCwgc2tpcCB0aGUgY2hlY2ssIHNvIHdlIGF2b2lkIG10aW1lcyBpbiAN Cj4+IHRoZSBmdXR1cmUgaW4gdGhhdCBjYXNlLg0KPj4NCj4+IDIpIEFkZCBhbiBhZGRpdGlvbmFs IGZpZWxkIHRvIG9iamVjdF9pbmZvX3QgdGhhdCBpcyBhbiBtdGltZSBiYXNlZCBvbiANCj4+IHRo ZSBPU0QncyBjbG9jaywgYW5kIHVzZSB0aGF0IGZvciB0aGVzZSBjaGVja3MuDQo+DQo+IEJ1dCBv cHRpb24gMiBpcyBwcm9iYWJseSBzdGlsbCBhIGdvb2QgaWRlYS4NCj4gLUdyZWcNCj4gLS0NCj4g VG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJl IGNlcGgtZGV2ZWwiIA0KPiBpbiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZn ZXIua2VybmVsLm9yZyBNb3JlIG1ham9yZG9tbyANCj4gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2Vy bmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQoNCg0KDQotLQ0KTWlsb3N6IFRhbnNraQ0KQ1RP DQoxNiBFYXN0IDM0dGggU3RyZWV0LCAxNXRoIGZsb29yDQpOZXcgWW9yaywgTlkgMTAwMTYNCg0K cDogNjQ2LTI1My05MDU1DQplOiBtaWxvc3pAYWRmaW4uY29tDQotLQ0KVG8gdW5zdWJzY3JpYmUg ZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGNlcGgtZGV2ZWwiIGlu IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnIE1vcmUg bWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8u aHRtbA0K -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index 5d822c1..1c7c527 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -1805,7 +1805,7 @@ void ReplicatedPG::execute_ctx(OpContext *ctx) // version ctx->at_version = get_next_version(); - ctx->mtime = m->get_mtime(); + ctx->mtime = ceph_clock_now(cct); dout(10) << "do_op " << soid << " " << ctx->ops << " ov " << obc->obs.oi.version << " av " << ctx->at_version
This fixes a bug when the time of the OSDs and clients are not synchronized (especially when client is ahead of OSD), and the cache tier dirty ratio reaches the threshold, the agent skips the flush work because it thinks the object is too young. The skipping flush code is as following: if (obc->obs.oi.mtime + utime_t(pool.info.cache_min_flush_age, 0) > now) { dout(20) << __func__ << " skip (too young) " << obc->obs.oi << dendl; osd->logger->inc(l_osd_agent_skip); return false; } Signed-off-by: Zhiqiang Wang <wonzhq@hotmail.com> --- src/osd/ReplicatedPG.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html