From patchwork Wed Nov 18 17:22:17 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Igor Fedotov X-Patchwork-Id: 7651241 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 004CFBF90C for ; Wed, 18 Nov 2015 17:22:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D022C204EA for ; Wed, 18 Nov 2015 17:22:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6FC4720411 for ; Wed, 18 Nov 2015 17:22:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756442AbbKRRWX (ORCPT ); Wed, 18 Nov 2015 12:22:23 -0500 Received: from mail-lb0-f172.google.com ([209.85.217.172]:34719 "EHLO mail-lb0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756438AbbKRRWW (ORCPT ); Wed, 18 Nov 2015 12:22:22 -0500 Received: by lbbcs9 with SMTP id cs9so28600229lbb.1 for ; Wed, 18 Nov 2015 09:22:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mirantis.com; s=google; h=to:from:subject:message-id:date:user-agent:mime-version :content-type:content-transfer-encoding; bh=UUc71yDJ3yFWdK9DXQ80PYdZKFZm5TwLouR4GtE4Knc=; b=kQM9D3JQifWzb4uQPCLv8HVdBdsH7wVdNMxEAQ2Hc+SSyTJFDd/ou0vpGyX7wW3u8b HZY0i8O13WPYDbfQqFPa1sg1gElSW1ACtzIDVHZ6NUhBVnrnwbk21H+MjcfA5ac8aplH so4ew9/x+bYAX8zavznzmDlKBTcQEIuWpfSRI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-type:content-transfer-encoding; bh=UUc71yDJ3yFWdK9DXQ80PYdZKFZm5TwLouR4GtE4Knc=; b=GR2e3msCSdehgbR6s1D5fvKXKq15dRy9hJudZyavAISKFH65O+8UeD3fJUmgG+dkwZ Mj9wj/sQ7X552KcQps5B69nS6wVM4FWMcpKK1RcSsTabbfXUC78L48pYe+wDcD/MUnhx 86IZeVs5M6/BIRBbPb5edCEV5Pir/OOeX89Ef0mtSPAjyq55WHYhDxJSLUDCz6+bUT54 hlo4xI6GC/lAWxTrFPzGYhcVjDJdQ0civa8mgA1t5+iKN4vKJTDjExVWmqjVSc6brkrV LLpG25qhKPNmDKBBGKxXrQmXLt/YB8pbkRLJTd2cMKwl7NfjMfaHMS/pPYtMVSzWZ74e J81Q== X-Gm-Message-State: ALoCoQk/99RpsrYQZoM9pK0REV+UtS2RTxMF1zkLpyj6g+h5OtkVslfJC+TWTGirdBreVUhfdE2b X-Received: by 10.112.156.2 with SMTP id wa2mr1209241lbb.39.1447867340377; Wed, 18 Nov 2015 09:22:20 -0800 (PST) Received: from [127.0.0.1] ([91.218.144.129]) by smtp.googlemail.com with ESMTPSA id e9sm567689lbs.13.2015.11.18.09.22.19 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Nov 2015 09:22:19 -0800 (PST) To: ceph-devel , ceph-devel From: Igor Fedotov Subject: [PATCH 1/1] osd: take removed/empty/small objects into account for cache flush triggering Message-ID: <564CB3C9.3030101@mirantis.com> Date: Wed, 18 Nov 2015 20:22:17 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Everybody, It seems that Ceph caching doesn't take removed ( whitened out) objects into accounts when checks for the need to flush the cache. Following pools have been created: ./ceph -c ceph.conf osd pool create cachepool 12 12 ./ceph -c ceph.conf osd pool create ecpool 12 12 erasure ./ceph -c ceph.conf osd tier add ecpool cachepool ./ceph -c ceph.conf osd tier cache-mode cachepool writeback ./ceph -c ceph.conf osd tier set-overlay ecpool cachepool ./ceph -c ceph.conf osd pool set cachepool hit_set_type bloom ./ceph -c ceph.conf osd pool set cachepool target_max_bytes 1000000 Then doing the following in a loop: Write 16K data to a new object with unique name Remove object. causes cache pool object count to grow permanently: ~/ceph/ceph_com/src# ./rados -c ceph.conf df pool name KB objects clones degraded unfound rd rd KB wr wr KB cachepool 48 285 0 0 0 0 0 567 4560 ... ~/ceph/ceph_com/src# ./rados -c ceph.conf df pool name KB objects clones degraded unfound rd rd KB wr wr KB cachepool 0 5947 0 0 0 0 0 11894 95152 ... etc The same applies to disk usage reported by du command: ~/ceph/ceph_com/src# du -h dev -s 461M dev ... ~/ceph/ceph_com/src# du -h dev -s 465M dev From code analysis it looks like following two parameters affect cache flush triggering: target_max_bytes and target_max_objects. When the latter set to 0 (by default) cache flush wouldn't happen no matter how many removed objects are in the cache since their size is supposed to be 0 bytes. But in fact that's not true - empty files(objects) consume some space too. Thus potentially one can even completely overfill the cache with removed objects. I understand that the above case is rather a corner one. And in real life additional user traffic may trigger cache flush. But probably it's worth to handle such case given the fact that it's pretty easy. In the patch below for the sake of simplicity I assumed that minimum object size is always 4K. Despite the fact that it probably depends on the underlying filesystem I think that's good enough to use this constant. And of cause that's just a simple correction in used space calculation to trigger cache flush - it doesn't ensure 100% correct calculation. Signed-off-by: Igor Fedotov --- num_dirty * avg_size * 1000000 / MAX(pool.info.target_max_bytes / divisor, 1); Thanks, Igor -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index 67a0657..d019135 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -11886,6 +11886,7 @@ bool ReplicatedPG::agent_choose_mode(bool restart, OpRequestRef op) uint64_t full_micro = 0; if (pool.info.target_max_bytes && num_user_objects > 0) { uint64_t avg_size = num_user_bytes / num_user_objects; + avg_size=MAX(avg_size, 4096); //take into account that tons of empty objects consume some disk space too. dirty_micro =