From patchwork Wed Mar 18 08:39:26 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kiyoshi Ueda X-Patchwork-Id: 12779 X-Patchwork-Delegate: agk@redhat.com Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n2I8dm8l022684 for ; Wed, 18 Mar 2009 08:39:49 GMT Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110]) by hormel.redhat.com (Postfix) with ESMTP id 9E274618B4B; Wed, 18 Mar 2009 04:39:49 -0400 (EDT) Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id n2I8dk8Y003493 for ; Wed, 18 Mar 2009 04:39:47 -0400 Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n2I8dlBI014365; Wed, 18 Mar 2009 04:39:47 -0400 Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.193]) by mx3.redhat.com (8.13.8/8.13.8) with ESMTP id n2I8dWZl024277; Wed, 18 Mar 2009 04:39:32 -0400 Received: from mailgate3.nec.co.jp ([10.7.69.161]) by tyo201.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id n2I8dRUU000463; Wed, 18 Mar 2009 17:39:27 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id n2I8dRf05000; Wed, 18 Mar 2009 17:39:27 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (mailsv.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.nec.co.jp (8.13.8/8.13.4) with ESMTP id n2I8dQne016051; Wed, 18 Mar 2009 17:39:26 +0900 (JST) Received: from elcondor.linux.bs1.fc.nec.co.jp (elcondor.linux.bs1.fc.nec.co.jp [10.34.125.195]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id D15B3E482A8; Wed, 18 Mar 2009 17:39:26 +0900 (JST) Message-ID: <49C0B33E.7050904@ct.jp.nec.com> Date: Wed, 18 Mar 2009 17:39:26 +0900 From: Kiyoshi Ueda User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: Alasdair Kergon References: <49C0B222.9010600@ct.jp.nec.com> In-Reply-To: <49C0B222.9010600@ct.jp.nec.com> X-RedHat-Spam-Score: 0 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 X-Scanned-By: MIMEDefang 2.63 on 172.16.48.32 X-loop: dm-devel@redhat.com Cc: device-mapper development , stefan.bader@canonical.com Subject: [dm-devel] [PATCH 2/3] dm-mpath: add queue-length oriented dynamic load balancer X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.5 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com This patch adds a dynamic load balancer, dm-queue-length, which balances the number of in-flight I/Os. The code is based on the patch posted by Stefan Bader: https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html Signed-off-by: Stefan Bader Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Cc: Vijayakumar Balasubramanian Cc: Alasdair G Kergon --- drivers/md/Kconfig | 9 + drivers/md/Makefile | 1 drivers/md/dm-queue-length.c | 257 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 267 insertions(+) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel Index: 2.6.29-rc8/drivers/md/dm-queue-length.c =================================================================== --- /dev/null +++ 2.6.29-rc8/drivers/md/dm-queue-length.c @@ -0,0 +1,257 @@ +/* + * Copyright (C) 2004-2005 IBM Corp. All Rights Reserved. + * Copyright (C) 2006-2009 NEC Corporation. + * + * dm-queue-length.c + * + * Module Author: Stefan Bader, IBM + * Modified by: Kiyoshi Ueda, NEC + * + * This file is released under the GPL. + * + * queue-length path selector, which chooses a path with the least number of + * in-flight I/Os. + */ + +#include "dm.h" +#include "dm-path-selector.h" + +#include +#include +#include +#include +#include + +#define DM_MSG_PREFIX "multipath queue-length" +#define QL_MIN_IO 128 +#define QL_VERSION "0.1.0" + +struct selector { + struct list_head valid_paths; + struct list_head failed_paths; +}; + +struct path_info { + struct list_head list; + struct dm_path *path; + unsigned int repeat_count; + atomic_t qlen; +}; + +static struct selector *alloc_selector(void) +{ + struct selector *s = kmalloc(sizeof(*s), GFP_KERNEL); + + if (s) { + INIT_LIST_HEAD(&s->valid_paths); + INIT_LIST_HEAD(&s->failed_paths); + } + + return s; +} + +static int ql_create(struct path_selector *ps, unsigned argc, char **argv) +{ + struct selector *s = alloc_selector(); + + if (!s) + return -ENOMEM; + + ps->context = s; + return 0; +} + +static void ql_free_paths(struct list_head *paths) +{ + struct path_info *pi, *next; + + list_for_each_entry_safe(pi, next, paths, list) { + list_del(&pi->list); + kfree(pi); + } +} + +static void ql_destroy(struct path_selector *ps) +{ + struct selector *s = (struct selector *) ps->context; + + ql_free_paths(&s->valid_paths); + ql_free_paths(&s->failed_paths); + kfree(s); + ps->context = NULL; +} + +static int ql_status(struct path_selector *ps, struct dm_path *path, + status_type_t type, char *result, unsigned int maxlen) +{ + int sz = 0; + struct path_info *pi; + + /* When called with (path == NULL), return selector status/args. */ + if (!path) + DMEMIT("0 "); + else { + pi = path->pscontext; + + switch (type) { + case STATUSTYPE_INFO: + DMEMIT("%u ", atomic_read(&pi->qlen)); + break; + case STATUSTYPE_TABLE: + DMEMIT("%u ", pi->repeat_count); + break; + } + } + + return sz; +} + +static int ql_add_path(struct path_selector *ps, struct dm_path *path, + int argc, char **argv, char **error) +{ + struct selector *s = (struct selector *) ps->context; + struct path_info *pi; + unsigned int repeat_count = QL_MIN_IO; + + /* Parse the arguments */ + if (argc > 1) { + *error = "queue-length ps: incorrect number of arguments"; + return -EINVAL; + } + + /* First path argument is number of I/Os before switching path. */ + if ((argc == 1) && (sscanf(argv[0], "%u", &repeat_count) != 1)) { + *error = "queue-length ps: invalid repeat count"; + return -EINVAL; + } + + /* Allocate the path information structure */ + pi = kmalloc(sizeof(*pi), GFP_KERNEL); + if (!pi) { + *error = "queue-length ps: Error allocating path information"; + return -ENOMEM; + } + + pi->path = path; + pi->repeat_count = repeat_count; + atomic_set(&pi->qlen, 0); + + path->pscontext = pi; + + list_add_tail(&pi->list, &s->valid_paths); + + return 0; +} + +static void ql_fail_path(struct path_selector *ps, struct dm_path *path) +{ + struct selector *s = (struct selector *) ps->context; + struct path_info *pi = path->pscontext; + + list_move(&pi->list, &s->failed_paths); +} + +static int ql_reinstate_path(struct path_selector *ps, struct dm_path *path) +{ + struct selector *s = (struct selector *) ps->context; + struct path_info *pi = path->pscontext; + + list_move_tail(&pi->list, &s->valid_paths); + + return 0; +} + +static struct dm_path *ql_select_path(struct path_selector *ps, + unsigned *repeat_count, size_t nr_bytes) +{ + struct selector *s = (struct selector *) ps->context; + struct path_info *pi = NULL, *best = NULL; + + if (list_empty(&s->valid_paths)) + return NULL; + + /* Change preferred (first in list) path to evenly balance. */ + list_move_tail(s->valid_paths.next, &s->valid_paths); + + list_for_each_entry(pi, &s->valid_paths, list) { + if (!best || + (atomic_read(&pi->qlen) < atomic_read(&best->qlen))) + best = pi; + + if (!atomic_read(&best->qlen)) + break; + } + + if (!best) + return NULL; + + *repeat_count = best->repeat_count; + + return best->path; +} + +static int ql_start_io(struct path_selector *ps, struct dm_path *path, + size_t nr_bytes) +{ + struct path_info *pi = path->pscontext; + + atomic_inc(&pi->qlen); + + return 0; +} + +static int ql_end_io(struct path_selector *ps, struct dm_path *path, + size_t nr_bytes) +{ + struct path_info *pi = path->pscontext; + + atomic_dec(&pi->qlen); + + return 0; +} + +static struct path_selector_type ql_ps = { + .name = "queue-length", + .module = THIS_MODULE, + .table_args = 1, + .info_args = 1, + .create = ql_create, + .destroy = ql_destroy, + .status = ql_status, + .add_path = ql_add_path, + .fail_path = ql_fail_path, + .reinstate_path = ql_reinstate_path, + .select_path = ql_select_path, + .start_io = ql_start_io, + .end_io = ql_end_io, +}; + +static int __init dm_ql_init(void) +{ + int r = dm_register_path_selector(&ql_ps); + + if (r < 0) + DMERR("register failed %d", r); + + DMINFO("version " QL_VERSION " loaded"); + + return r; +} + +static void __exit dm_ql_exit(void) +{ + int r = dm_unregister_path_selector(&ql_ps); + + if (r < 0) + DMERR("unregister failed %d", r); +} + +module_init(dm_ql_init); +module_exit(dm_ql_exit); + +MODULE_AUTHOR("Stefan Bader "); +MODULE_DESCRIPTION( + "(C) Copyright IBM Corp. 2004,2005 All Rights Reserved.\n" + DM_NAME " path selector to balance the number of in-flight I/Os" +); +MODULE_LICENSE("GPL"); Index: 2.6.29-rc8/drivers/md/Makefile =================================================================== --- 2.6.29-rc8.orig/drivers/md/Makefile +++ 2.6.29-rc8/drivers/md/Makefile @@ -34,6 +34,7 @@ obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o obj-$(CONFIG_DM_CRYPT) += dm-crypt.o obj-$(CONFIG_DM_DELAY) += dm-delay.o obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o +obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o obj-$(CONFIG_DM_ZERO) += dm-zero.o Index: 2.6.29-rc8/drivers/md/Kconfig =================================================================== --- 2.6.29-rc8.orig/drivers/md/Kconfig +++ 2.6.29-rc8/drivers/md/Kconfig @@ -274,6 +274,15 @@ config DM_MULTIPATH ---help--- Allow volume managers to support multipath hardware. +config DM_MULTIPATH_QL + tristate "I/O Path Selector based on the number of in-flight I/Os" + depends on DM_MULTIPATH + ---help--- + This path selector is a dynamic load balancer which selects + a path having the least number of in-flight I/Os. + + If unsure, say N. + config DM_DELAY tristate "I/O delaying target (EXPERIMENTAL)" depends on BLK_DEV_DM && EXPERIMENTAL