From patchwork Sun Feb 17 09:41:00 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 2153051 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 5842CDF283 for ; Sun, 17 Feb 2013 09:40:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755913Ab3BQJkG (ORCPT ); Sun, 17 Feb 2013 04:40:06 -0500 Received: from smtp207.alice.it ([82.57.200.103]:52853 "EHLO smtp207.alice.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755406Ab3BQJkE (ORCPT ); Sun, 17 Feb 2013 04:40:04 -0500 Received: from [192.168.0.27] (95.242.133.103) by smtp207.alice.it (8.6.060.12) (authenticated as kreijack@alice.it) id 5102601404283486; Sun, 17 Feb 2013 10:39:56 +0100 Message-ID: <5120A5AC.8020107@inwind.it> Date: Sun, 17 Feb 2013 10:41:00 +0100 From: Goffredo Baroncelli Reply-To: kreijack@inwind.it User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120418 Icedove/11.0 MIME-Version: 1.0 To: Chris Mason CC: linux-btrfs Subject: [PATCH][BTRFS] raid5/6: chunk allocation X-Enigmail-Version: 1.4.1 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Hi Chris, I am playing with the raid5/6 code, to adapt my "disk-usage" patches to the raid5/6 code. During this develop I found that the chunk allocation is strange. Looking at the code I found in volume.c the following codes: 3576 static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, 3730 /* 3731 * this will have to be fixed for RAID1 and RAID10 over 3732 * more drives 3733 */ 3734 data_stripes = num_stripes / ncopies; 3735 3736 if (stripe_size * ndevs > max_chunk_size * ncopies) { 3737 stripe_size = max_chunk_size * ncopies; 3738 do_div(stripe_size, ndevs); 3739 } This code decides how big is a chunk, following two mains roles: 1) the chunk stripe shall be less than max_stripe_size 2) the chunk capability (the space usable by the user) shall be less than max_chunk_size. The code above works well in case of RAID0/RAID1/DUP/SINGLE/RAID10 but doesn't play well in case of RAID5/6. In fact in case the chunk type is BTRFS_BLOCK_GROUP_METADATA then max_stripe_size is 1GB and max_chunk_size is 1GB too. If the number of devices (ndevs) is 7 and the raid profile is RAID6, then ncopies is 3, the stripe_size is 1GB*3/7 = 438MB, which lead to a chunk size of 2.14GB ! Which is not the expected value. I think that we should change the test above in raid6 case to data_stripes = ndevs - 2; if (stripe_size * data_stripes > max_chunk_size) { stripe_size = max_chunk_size; do_div(stripe_size, data_stripes); } The patch below should solve this issue, and clean up a bit the logic separating the code of raid5, raid6 from the code of the others raid profiles. Anyway I would like to point out another possible issue: the fragmentation. To avoid the fragmentation should we round up the stripe size to a more sane value like like 256MB ? I know that this could led to an "insane" chunk size when the number of disk is higher; but the current logic ( the stripe_size is equal to the chunk_size / number_of_device) could lead to fragmentation problem when different raid profiles where used together. BR G.Baroncelli Signed-off-by: Goffredo Baroncelli diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c372264..88d17b4 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3724,25 +3724,32 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, stripe_size = devices_info[ndevs-1].max_avail; num_stripes = ndevs * dev_stripes; - /* - * this will have to be fixed for RAID1 and RAID10 over - * more drives - */ - data_stripes = num_stripes / ncopies; - - if (stripe_size * ndevs > max_chunk_size * ncopies) { - stripe_size = max_chunk_size * ncopies; - do_div(stripe_size, ndevs); - } if (type & BTRFS_BLOCK_GROUP_RAID5) { raid_stripe_len = find_raid56_stripe_len(ndevs - 1, btrfs_super_stripesize(info->super_copy)); - data_stripes = num_stripes - 1; - } - if (type & BTRFS_BLOCK_GROUP_RAID6) { + data_stripes = ndevs - 1; + if (stripe_size * data_stripes > max_chunk_size) { + stripe_size = max_chunk_size; + do_div(stripe_size, data_stripes); + } + } else if (type & BTRFS_BLOCK_GROUP_RAID6) { raid_stripe_len = find_raid56_stripe_len(ndevs - 2, btrfs_super_stripesize(info->super_copy)); - data_stripes = num_stripes - 2; + data_stripes = ndevs - 2; + if (stripe_size * data_stripes > max_chunk_size) { + stripe_size = max_chunk_size; + do_div(stripe_size, data_stripes); + } + } else { /* RAID1, RAID0, RAID10, SINGLE, SUP */ + /* + * this will have to be fixed for RAID1 and RAID10 over + * more drives + */ + data_stripes = num_stripes / ncopies; + if (stripe_size * ndevs > max_chunk_size * ncopies) { + stripe_size = max_chunk_size * ncopies; + do_div(stripe_size, ndevs); + } } do_div(stripe_size, dev_stripes);