From patchwork Tue Sep 10 00:40:37 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Dunlop X-Patchwork-Id: 2863611 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id DA0579F2D6 for ; Tue, 10 Sep 2013 00:41:11 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DE0FC20342 for ; Tue, 10 Sep 2013 00:41:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D337E20328 for ; Tue, 10 Sep 2013 00:41:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756146Ab3IJAlI (ORCPT ); Mon, 9 Sep 2013 20:41:08 -0400 Received: from smtp1.onthe.net.au ([203.22.196.249]:60737 "EHLO smtp1.onthe.net.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755756Ab3IJAlH (ORCPT ); Mon, 9 Sep 2013 20:41:07 -0400 Received: from localhost (localhost [127.0.0.1]) by smtp1.onthe.net.au (Postfix) with ESMTP id 0779C60F97; Tue, 10 Sep 2013 10:40:54 +1000 (EST) Received: from smtp1.onthe.net.au ([127.0.0.1]) by localhost (smtp1.onthe.net.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 60gX1nTZpN6C; Tue, 10 Sep 2013 10:40:39 +1000 (EST) Received: from o1.private.otn.net.au (o1.private.onthe.net.au [10.200.63.41]) by smtp1.onthe.net.au (Postfix) with ESMTP id 0ACF0612DC; Tue, 10 Sep 2013 10:40:38 +1000 (EST) Received: from achates.office.onthe.net.au (achates-gw1-vpn.private.onthe.net.au [10.9.1.8]) by o1.private.otn.net.au (Postfix) with ESMTP id 176B780240; Tue, 10 Sep 2013 10:40:37 +1000 (EST) Received: by achates.office.onthe.net.au (Postfix, from userid 999) id 4DCA73002A0; Tue, 10 Sep 2013 10:40:37 +1000 (EST) Date: Tue, 10 Sep 2013 10:40:37 +1000 From: Chris Dunlop To: Sage Weil Cc: ceph-devel@vger.kernel.org Subject: Re: OSD repair: on disk size does not match object info size Message-ID: <20130910004037.GA2080@onthe.net.au> References: <20130909231643.GA31877@onthe.net.au> <20130910000145.GA857@onthe.net.au> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Sep 09, 2013 at 05:14:14PM -0700, Sage Weil wrote: > On Tue, 10 Sep 2013, Chris Dunlop wrote: >> On Mon, Sep 09, 2013 at 04:30:33PM -0700, Sage Weil wrote: >>> On Tue, 10 Sep 2013, Chris Dunlop wrote: >>>> G'day, >>>> >>>> On 0.56.7-1~bpo70+1 I'm getting: >>>> >>>> # ceph pg dump | grep inconsistent >>>> 013-09-10-08:39:59 2.bc 2776 0 0 0 11521799680 162063 162063 active+clean+inconsistent 2013-09-10 08:38:38.482302 20512'699877 20360'13461026 [6,0] [6,0] 20512'699877 2013-09-10 08:38:38.482264 20512'699877 2013-09-10 08:38:38.482264 >>>> >>>> # ceph pg repair 2.bc >>>> instructing pg 2.bc on osd.6 to repair >>>> >>>> # tail /var/log/ceph/ceph-osd.6.log >>>> 2013-09-10 08:17:25.557926 7fef09c14700 0 log [ERR] : repair 2.bc 89ebebc/rbd_data.13a0c74b0dc51.00000000000107ec/head//2 on disk size (4194304) does not match object info size (4104192) >>>> 2013-09-10 08:17:27.316112 7fef09c14700 0 log [ERR] : 2.bc repair 1 errors, 0 fixed >>>> >>>> # ls -l 'ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' >>>> -rw-r--r-- 1 root root 4194304 Sep 8 21:01 ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 >>>> # ls -l 'ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' >>>> -rw-r--r-- 1 root root 4194304 Sep 8 21:01 ceph-0/current/2.bc_head/DIR_C/DIR_B/DIR_E/rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2 >>>> >>>> One possible solution would be to simply truncate the objects down to the >>>> object info size, as recommended in this case: >>>> >>>> http://www.spinics.net/lists/ceph-users/msg00793.html >>>> >>>> However I'm a little concerned about that solution as the on-disk size is >>>> exactly 4MB, which I think is the expected size of these objects, and matches >>>> the size of all the other objects in the same directory, and the "extra" data >>>> looks a little interesting, with "FILE0" blocks in there (what are those?): >>>> >>>> # cd /var/lib/ceph/osd/ceph-6/current/2.bc_head/DIR_C/DIR_B/DIR_E/ >>>> # dd if='rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' bs=1024 skip=4008 | od -c >>>> 0000000 F I L E 0 \0 003 \0 312 j o o \0 \0 \0 \0 >>>> 0000020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 >>>> 0000040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 310 p 017 \0 >>>> 0000060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 >>>> ... >>>> 0002000 F I L E 0 \0 003 \0 002 k o o \0 \0 \0 \0 >>>> 0002020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 >>>> 0002040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 311 p 017 \0 >>>> 0002060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 >>>> ... >>>> 0004000 F I L E 0 \0 003 \0 023 r o o \0 \0 \0 \0 >>>> 0004020 001 \0 001 \0 8 \0 001 \0 X 001 \0 \0 \0 004 \0 \0 >>>> 0004040 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 \0 \0 312 p 017 \0 >>>> 0004060 002 \0 \0 \0 \0 \0 \0 \0 020 \0 \0 \0 ` \0 \0 \0 >>>> >>>> Is it safe to simply truncate this object, or what other solutions might >>>> be applicable? >>> >>> The alternative is to edit the xattr. That's harder, but better. You'll >>> want grab the user.ceph._ xattr, change the the one instance of 4104192 to >>> 4194304, and then reset it. You can use >>> >>> ceph-dencoder type object_info_t import /tmp/xattrfile decode dump_json >>> >>> to verify that it decodes properly before and after you make the edit. I >>> like the 'attr' tool for getting/setting xattrs. >> >> Can ceph-dencoder import the (modified) json and write out the >> encoded binary suitable for setting in the xattr? > > It can't, sadly. > >> If not, what encoding is the xattr, so I can work out what I >> need to do to make the change? > > It's little-endian. So 'printf "%x\n" $badsize' and look for that value > with hexedit or whatever, and check your work with ceph-dencoder. OK, for the record: # printf '%x\n' 4104192 3ea000 # printf '%x\n' 4194304 400000 # attr -q -g ceph._ 'rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' > /tmp/attr.1 ## Note: reversed bytes # xxd /tmp/attr.1 | sed 's/a03e/0040/' | xxd -r > /tmp/attr.2 # diff -u \ <(ceph-dencoder type object_info_t import /tmp/attr.1 decode dump_json) \ <(ceph-dencoder type object_info_t import /tmp/attr.2 decode dump_json) # attr -s ceph._ 'rbd\udata.13a0c74b0dc51.00000000000107ec__head_089EBEBC__2' < /tmp/attr.2 (And repeat the 'attr -s' on the secondary storage) ...and repairing again! Thanks for your help. Chris. --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- /dev/fd/63 2013-09-10 10:28:59.882470249 +1000 +++ /dev/fd/62 2013-09-10 10:28:59.882470249 +1000 @@ -9,7 +9,7 @@ "version": "20274'699051", "prior_version": "20016'685946", "last_reqid": "client.91723.0:521009894", - "size": 4104192, + "size": 4194304, "mtime": "2013-09-08 21:01:45.543328", "lost": 0, "wrlock_by": "unknown.0.0:0",