From patchwork Thu Sep 17 08:02:39 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kiyoshi Ueda X-Patchwork-Id: 48249 Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n8H83iX3000401 for ; Thu, 17 Sep 2009 08:03:44 GMT Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110]) by hormel.redhat.com (Postfix) with ESMTP id D312B619642; Thu, 17 Sep 2009 04:03:43 -0400 (EDT) Received: from int-mx02.intmail.prod.int.phx2.redhat.com (nat-pool.util.phx.redhat.com [10.8.5.200]) by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id n8H83gEF014801 for ; Thu, 17 Sep 2009 04:03:42 -0400 Received: from mx1.redhat.com (ext-mx03.extmail.prod.ext.phx2.redhat.com [10.5.110.7]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id n8H83gG4009066; Thu, 17 Sep 2009 04:03:42 -0400 Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n8H83Vxx008219; Thu, 17 Sep 2009 04:03:32 -0400 Received: from mailgate3.nec.co.jp ([10.7.69.161]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id n8H83UYi023433; Thu, 17 Sep 2009 17:03:30 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id n8H83Te29981; Thu, 17 Sep 2009 17:03:29 +0900 (JST) Received: from mail03.kamome.nec.co.jp (mail03.kamome.nec.co.jp [10.25.43.7]) by mailsv4.nec.co.jp (8.13.8/8.13.4) with ESMTP id n8H83Tg3021882; Thu, 17 Sep 2009 17:03:29 +0900 (JST) Received: from saigo.jp.nec.com ([10.26.220.6] [10.26.220.6]) by mail03.kamome.nec.co.jp with ESMTP id BT-MMP-2030861; Thu, 17 Sep 2009 17:02:44 +0900 Received: from elcondor.linux.bs1.fc.nec.co.jp ([10.34.125.195] [10.34.125.195]) by mail.jp.nec.com with ESMTP; Thu, 17 Sep 2009 17:02:43 +0900 Message-ID: <4AB1ED1F.1010203@ct.jp.nec.com> Date: Thu, 17 Sep 2009 17:02:39 +0900 From: Kiyoshi Ueda User-Agent: Thunderbird 2.0.0.23 (X11/20090825) MIME-Version: 1.0 To: David Strand , Mike Snitzer , Alasdair Kergon Subject: Re: [dm-devel] fragmented i/o with 2.6.31? References: <448b15030909160834j2b127c83jab163e1860fc9aa1@mail.gmail.com> <448b15030909160922o84c2d6gc8ead8226dd8777a@mail.gmail.com> In-Reply-To: <448b15030909160922o84c2d6gc8ead8226dd8777a@mail.gmail.com> X-RedHat-Spam-Score: 0 () X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.7 X-loop: dm-devel@redhat.com Cc: device-mapper development X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.5 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com Hi David, Mike, Alasdair, On 09/17/2009 01:22 AM +0900, David Strand wrote: > On Wed, Sep 16, 2009 at 8:34 AM, David Strand wrote: >> I am issuing 512 Kbyte reads through the device mapper device node to >> a fibre channel disk. With 2.6.30 one read command for the entire 512 >> Kbyte length is placed on the wire. With 2.6.31 this is being broken >> up into 5 smaller read commands placed on the wire, decreasing >> performance. >> >> This is especially penalizing on some disks where we have prefetch >> turned off via the scsi mode page. Is there any easy way (through >> configuration or sysfs) to restore the single read per i/o behavior >> that I used to get? > > I should note that I am using dm-mpath, and the i/o is fragmented on > the wire when using the device mapper device node but it is not > fragmented when using one of the regular /dev/sd* device nodes for > that device. David, Thank you for reporting this. I found on my test machine that max_sectors is set to SAFE_MAX_SECTORS, which limits the I/O size small. The attached patch fixes it. I guess the patch (and increasing read-ahead size in /sys/block/dm-/queue/read_ahead_kb) will solve your fragmentation issue. Please try it. Mike, Alasdair, I found that max_sectors and max_hw_sectors of dm device are set in smaller values than those of underlying devices. E.g: # cat /sys/block/sdj/queue/max_sectors_kb 512 # cat /sys/block/sdj/queue/max_hw_sectors_kb 32767 # echo "0 10 linear /dev/sdj 0" | dmsetup create test # cat /sys/block/dm-0/queue/max_sectors_kb 127 # cat /sys/block/dm-0/queue/max_hw_sectors_kb 127 This prevents the I/O size of struct request from becoming enough big size, and causes undesired request fragmentation in request-based dm. This should be caused by the queue_limits stacking. In dm_calculate_queue_limits(), the block-layer's small default size is included in the merging process of target's queue_limits. So underlying queue_limits is not propagated correctly. I think initializing default values of all max_* in '0' is an easy fix. Do you think my patch is acceptable? Any other idea to fix this problem? Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Cc: David Strand Cc: Mike Snitzer , Cc: Alasdair G Kergon --- drivers/md/dm-table.c | 4 ++++ 1 file changed, 4 insertions(+) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel Index: 2.6.31/drivers/md/dm-table.c =================================================================== --- 2.6.31.orig/drivers/md/dm-table.c +++ 2.6.31/drivers/md/dm-table.c @@ -992,9 +992,13 @@ int dm_calculate_queue_limits(struct dm_ unsigned i = 0; blk_set_default_limits(limits); + limits->max_sectors = 0; + limits->max_hw_sectors = 0; while (i < dm_table_get_num_targets(table)) { blk_set_default_limits(&ti_limits); + ti_limits.max_sectors = 0; + ti_limits.max_hw_sectors = 0; ti = dm_table_get_target(table, i++);