From patchwork Fri Apr 13 05:51:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Changwei Ge X-Patchwork-Id: 10339547 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3F46F60329 for ; Fri, 13 Apr 2018 05:53:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B3CD28358 for ; Fri, 13 Apr 2018 05:53:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F395528385; Fri, 13 Apr 2018 05:52:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) (using TLSv1.2 with cipher AES256-SHA256 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CE7EB28358 for ; Fri, 13 Apr 2018 05:52:57 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w3D5pvJC168482; Fri, 13 Apr 2018 05:52:41 GMT Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2h6ne7qb8t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 13 Apr 2018 05:52:40 +0000 Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w3D5qbCj002605 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 13 Apr 2018 05:52:37 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1f6rdU-0002CB-U8; Thu, 12 Apr 2018 22:52:36 -0700 Received: from userv0021.oracle.com ([156.151.31.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1f6rd4-00023T-Lf for ocfs2-devel@oss.oracle.com; Thu, 12 Apr 2018 22:52:11 -0700 Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w3D5qAb0001713 (version=TLSv1/SSLv3 cipher=AES256-SHA256 bits=256 verify=FAIL); Fri, 13 Apr 2018 05:52:10 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w3D5gmRZ010665; Fri, 13 Apr 2018 05:52:10 GMT Received: from h3cmg01-ex.h3c.com (smtp.h3c.com [60.191.123.56]) by userp2030.oracle.com with ESMTP id 2hagvpx8sh-1; Fri, 13 Apr 2018 05:52:09 +0000 Received: from BJHUB02-EX.srv.huawei-3com.com (unknown [10.63.20.170]) by h3cmg01-ex.h3c.com with smtp id 79ac_0bc4_304e30d5_2e04_4cff_a65f_c86a71ef9ad4; Fri, 13 Apr 2018 13:50:18 +0800 Received: from localhost.localdomain (10.125.136.231) by rndsmtp.h3c.com (10.63.20.175) with Microsoft SMTP Server id 14.3.248.2; Fri, 13 Apr 2018 13:50:02 +0800 From: Changwei Ge To: , , , , , Date: Fri, 13 Apr 2018 13:51:07 +0800 Message-ID: <1523598667-29401-1-git-send-email-ge.changwei@h3c.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 X-Originating-IP: [10.125.136.231] X-CLX-Shades: MLX X-CLX-Response: 1TFkXGx0TEQpMehcaEQpZTRdnZnIRCllJFxpxGhAadwYbHxNxGRIQGncGGBo GGhEKWV4XaG55EQpJRhdFWEtJRk91WlhFTl9JXkNFRBl1T0sRCkNOF2tkR2FBWlodQ0FbGmMHXh lwXGtLT1puTGhLZR8bbUZdEQpYXBcfBBoEGxkYB0gaSR4ZHxkeBRsaBBsaGgQeEgQbEBseGh8aE QpeWRd5aE9FRhEKTVwXGRwcEQpMWhdoaUJNUxEKTU4XaBEKQ1oXHBoEGxMbBBsYGQQfHBEKQl4X GxEKRFgXGBEKRF4XHBEKREkXGBEKQkYXZxNtYBtbZUIffn0RCkJcFxoRCkJFF24ZWExeYQFwUkx hEQpCThdkQnxaRURBYh1kUBEKQkwXb35dTRgFXWYaUnsRCkJsF2RhT0tgQkgSeB1nEQpCQBdtZV 0aXF1kZU8eYhEKWlgXGBEKcGgXYmJzAVNOf1p7WEkQGhEKcGgXbEtNY3hdSR0BWW8QGhEKcGgXa xlHYGh5GWMSYnMQGhEKcGgXb0xOBWlmeHhFGRMQGhEKcGgXbnlgX25FeE1AWHgQGhEKcGwXbU4b b1MBR1JIHXMQGRoRCm1+FxoRClhNF0sRIA== X-PDR: PASS X-Source-IP: 60.191.123.56 X-ServerName: smtp.h3c.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:60.191.123.56 ip4:60.191.123.50 ip4:221.12.31.13 ip4:221.12.31.56 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8861 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=0 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=179 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804130052 X-Spam: Clean Cc: ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH] ocfs2: submit another bio if current bio is full X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8861 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804130053 X-Virus-Scanned: ClamAV using ClamSMTP If cluster scale exceeds 16 nodes, bio will be full and bio_add_page() returns 0 when adding pages to bio. Returning -EIO to o2hb_read_slots() from o2hb_setup_one_bio() will lead to losing chance to allocate more bios to present all heartbeat region. So o2hb_read_slots() fails. In my test, making fs fails in starting o2cb service. Attach error log: (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0, bi_sector 8192 (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5 (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5 (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5 Fixes: ba16ddfbeb9d ("ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" Signed-off-by: Changwei Ge --- fs/ocfs2/cluster/heartbeat.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c index 91a8889abf9b..2809e29d612d 100644 --- a/fs/ocfs2/cluster/heartbeat.c +++ b/fs/ocfs2/cluster/heartbeat.c @@ -540,11 +540,12 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, struct bio *bio; struct page *page; +#define O2HB_BIO_VECS 16 /* Testing has shown this allocation to take long enough under * GFP_KERNEL that the local node can get fenced. It would be * nicest if we could pre-allocate these bios and avoid this * all together. */ - bio = bio_alloc(GFP_ATOMIC, 16); + bio = bio_alloc(GFP_ATOMIC, O2HB_BIO_VECS); if (!bio) { mlog(ML_ERROR, "Could not alloc slots BIO!\n"); bio = ERR_PTR(-ENOMEM); @@ -570,7 +571,10 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, current_page, vec_len, vec_start); len = bio_add_page(bio, page, vec_len, vec_start); - if (len != vec_len) { + if (len == 0 && current_page == O2HB_BIO_VECS) { + /* bio is full now. */ + goto bail; + } else if (len != vec_len) { mlog(ML_ERROR, "Adding page[%d] to bio failed, " "page %p, len %d, vec_len %u, vec_start %u, " "bi_sector %llu\n", current_page, page, len,