From patchwork Thu Sep 17 08:02:39 2009
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
X-Patchwork-Id: 48249
Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33])
	by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n8H83iX3000401
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Thu, 17 Sep 2009 08:03:44 GMT
Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com
	[10.8.4.110])
	by hormel.redhat.com (Postfix) with ESMTP id D312B619642;
	Thu, 17 Sep 2009 04:03:43 -0400 (EDT)
Received: from int-mx02.intmail.prod.int.phx2.redhat.com
	(nat-pool.util.phx.redhat.com [10.8.5.200])
	by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id
	n8H83gEF014801 for <dm-devel@listman.util.phx.redhat.com>;
	Thu, 17 Sep 2009 04:03:42 -0400
Received: from mx1.redhat.com (ext-mx03.extmail.prod.ext.phx2.redhat.com
	[10.5.110.7])
	by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id n8H83gG4009066; Thu, 17 Sep 2009 04:03:42 -0400
Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193])
	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n8H83Vxx008219;
	Thu, 17 Sep 2009 04:03:32 -0400
Received: from mailgate3.nec.co.jp ([10.7.69.161])
	by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id n8H83UYi023433;
	Thu, 17 Sep 2009 17:03:30 +0900 (JST)
Received: (from root@localhost) by mailgate3.nec.co.jp
	(8.11.7/3.7W-MAILGATE-NEC)
	id n8H83Te29981; Thu, 17 Sep 2009 17:03:29 +0900 (JST)
Received: from mail03.kamome.nec.co.jp (mail03.kamome.nec.co.jp [10.25.43.7])
	by mailsv4.nec.co.jp (8.13.8/8.13.4) with ESMTP id n8H83Tg3021882;
	Thu, 17 Sep 2009 17:03:29 +0900 (JST)
Received: from saigo.jp.nec.com ([10.26.220.6] [10.26.220.6]) by
	mail03.kamome.nec.co.jp with ESMTP id BT-MMP-2030861;
	Thu, 17 Sep 2009 17:02:44 +0900
Received: from elcondor.linux.bs1.fc.nec.co.jp ([10.34.125.195]
	[10.34.125.195]) by mail.jp.nec.com with ESMTP;
	Thu, 17 Sep 2009 17:02:43 +0900
Message-ID: <4AB1ED1F.1010203@ct.jp.nec.com>
Date: Thu, 17 Sep 2009 17:02:39 +0900
From: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090825)
MIME-Version: 1.0
To: David Strand <dpstrand@gmail.com>, Mike Snitzer <snitzer@redhat.com>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [dm-devel] fragmented i/o with 2.6.31?
References: <448b15030909160834j2b127c83jab163e1860fc9aa1@mail.gmail.com>
	<448b15030909160922o84c2d6gc8ead8226dd8777a@mail.gmail.com>
In-Reply-To: <448b15030909160922o84c2d6gc8ead8226dd8777a@mail.gmail.com>
X-RedHat-Spam-Score: 0  ()
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.7
X-loop: dm-devel@redhat.com
Cc: device-mapper development <dm-devel@redhat.com>
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.5
Precedence: junk
Reply-To: device-mapper development <dm-devel@redhat.com>
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com

Hi David, Mike, Alasdair,

On 09/17/2009 01:22 AM +0900, David Strand wrote:
> On Wed, Sep 16, 2009 at 8:34 AM, David Strand <dpstrand@gmail.com> wrote:
>> I am issuing 512 Kbyte reads through the device mapper device node to
>> a fibre channel disk. With 2.6.30 one read command for the entire 512
>> Kbyte length is placed on the wire. With 2.6.31 this is being broken
>> up into 5 smaller read commands placed on the wire, decreasing
>> performance.
>>
>> This is especially penalizing on some disks where we have prefetch
>> turned off via the scsi mode page. Is there any easy way (through
>> configuration or sysfs) to restore the single read per i/o behavior
>> that I used to get?
>
> I should note that I am using dm-mpath, and the i/o is fragmented on
> the wire when using the device mapper device node but it is not
> fragmented when using one of the regular /dev/sd* device nodes for
> that device.

David,
Thank you for reporting this.
I found on my test machine that max_sectors is set to SAFE_MAX_SECTORS,
which limits the I/O size small.
The attached patch fixes it.  I guess the patch (and increasing
read-ahead size in /sys/block/dm-<n>/queue/read_ahead_kb) will solve
your fragmentation issue.  Please try it.


Mike, Alasdair,
I found that max_sectors and max_hw_sectors of dm device are set
in smaller values than those of underlying devices.  E.g:
    # cat /sys/block/sdj/queue/max_sectors_kb
    512
    # cat /sys/block/sdj/queue/max_hw_sectors_kb
    32767
    # echo "0 10 linear /dev/sdj 0" | dmsetup create test
    # cat /sys/block/dm-0/queue/max_sectors_kb
    127
    # cat /sys/block/dm-0/queue/max_hw_sectors_kb
    127
This prevents the I/O size of struct request from becoming enough big
size, and causes undesired request fragmentation in request-based dm.

This should be caused by the queue_limits stacking.
In dm_calculate_queue_limits(), the block-layer's small default size
is included in the merging process of target's queue_limits.
So underlying queue_limits is not propagated correctly.

I think initializing default values of all max_* in '0' is an easy fix.
Do you think my patch is acceptable?
Any other idea to fix this problem?

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: David Strand <dpstrand@gmail.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm-table.c |    4 ++++
 1 file changed, 4 insertions(+)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Index: 2.6.31/drivers/md/dm-table.c
===================================================================
--- 2.6.31.orig/drivers/md/dm-table.c
+++ 2.6.31/drivers/md/dm-table.c
@@ -992,9 +992,13 @@ int dm_calculate_queue_limits(struct dm_
 	unsigned i = 0;
 
 	blk_set_default_limits(limits);
+	limits->max_sectors = 0;
+	limits->max_hw_sectors = 0;
 
 	while (i < dm_table_get_num_targets(table)) {
 		blk_set_default_limits(&ti_limits);
+		ti_limits.max_sectors = 0;
+		ti_limits.max_hw_sectors = 0;
 
 		ti = dm_table_get_target(table, i++);