From patchwork Wed Jan 21 18:32:38 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chaitanya Huilgol X-Patchwork-Id: 5679841 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 3924FC058D for ; Wed, 21 Jan 2015 18:32:49 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id AB075204CF for ; Wed, 21 Jan 2015 18:32:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E7A4F204AF for ; Wed, 21 Jan 2015 18:32:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752862AbbAUSco (ORCPT ); Wed, 21 Jan 2015 13:32:44 -0500 Received: from mail-bl2on0079.outbound.protection.outlook.com ([65.55.169.79]:39390 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751758AbbAUScn convert rfc822-to-8bit (ORCPT ); Wed, 21 Jan 2015 13:32:43 -0500 Received: from BY1PR0201CA0015.namprd02.prod.outlook.com (25.160.191.153) by BY2PR02MB123.namprd02.prod.outlook.com (10.242.43.148) with Microsoft SMTP Server (TLS) id 15.1.59.20; Wed, 21 Jan 2015 18:32:40 +0000 Received: from BY2FFO11FD043.protection.gbl (2a01:111:f400:7c0c::172) by BY1PR0201CA0015.outlook.office365.com (2a01:111:e400:4814::25) with Microsoft SMTP Server (TLS) id 15.1.65.19 via Frontend Transport; Wed, 21 Jan 2015 18:32:39 +0000 Received: from sacsmgep12.sandisk.com (74.221.232.164) by BY2FFO11FD043.mail.protection.outlook.com (10.1.14.228) with Microsoft SMTP Server id 15.1.59.14 via Frontend Transport; Wed, 21 Jan 2015 18:32:39 +0000 X-AuditID: ac1c210f-f79866d000001195-9d-54bff0c77b9f Received: from SACHUBIP01.sdcorp.global.sandisk.com ( [172.28.1.254]) by sacsmgep12.sandisk.com (Symantec Messaging Gateway) with SMTP id 2A.B5.04501.7C0FFB45; Wed, 21 Jan 2015 10:32:39 -0800 (PST) Received: from SACMBXIP01.sdcorp.global.sandisk.com ([169.254.1.188]) by SACHUBIP01.sdcorp.global.sandisk.com ([10.181.10.103]) with mapi id 14.03.0195.001; Wed, 21 Jan 2015 10:32:39 -0800 From: Chaitanya Huilgol To: "ceph-devel@vger.kernel.org" Subject: [PATCH] ceph: rbd option listing and tcp_nodelay support Thread-Topic: [PATCH] ceph: rbd option listing and tcp_nodelay support Thread-Index: AdA1pzxXb65zNshHTr2hqtoABBYE0QAAOT5Q Date: Wed, 21 Jan 2015 18:32:38 +0000 Message-ID: <9E914F5BD7F48A4782456CEB550A42280A75F959@SACMBXIP01.sdcorp.global.sandisk.com> References: <9E914F5BD7F48A4782456CEB550A42280A75F93C@SACMBXIP01.sdcorp.global.sandisk.com> In-Reply-To: <9E914F5BD7F48A4782456CEB550A42280A75F93C@SACMBXIP01.sdcorp.global.sandisk.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.28.1.254] MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrILMWRmVeSWpSXmKPExsWyRobxn+7xD/tDDFo/i1t8uDmJyYHR4/Mm uQDGKC6blNSczLLUIn27BK6Myxe+sRVMrq04+WA6ewNjZ1QXIyeHhICJxN+Te5ggbDGJC/fW s3UxcnEICRxnlNh6djUzhHOAUeLb15uMIFVsAuYSt9cdAesQEbCV2Dj7J1Ccg0NYwFHi3u1Q iLCbxJXma6wQtpFEz8cnLCA2i4CqxO77q8HKeQWiJdZeKQEJCwGZJ3Z/BZvIKRAj8XrFD7By RqB7vp9aAxZnFhCXuPVkPtSdAhJL9pxnhrBFJV4+/scKYStIfF7xjw2iXkdiwe5PULa2xLKF r8HqeQUEJU7OfMIygVF0FpKxs5C0zELSMgtJywJGllWMYsWJycW56akFhkZ6xYl5KZnF2XrJ +bmbGMERoci/g3HbFPNDjAIcjEo8vA+k9ocIsSaWFVfmHmKU4GBWEuGdfx0oxJuSWFmVWpQf X1Sak1p8iFGag0VJnHeV26xgIYH0xJLU7NTUgtQimCwTB6dUA2PId6/m86Z2x8X2xi+e+PDc GvOjVm6n9N5mdUTJaz2MmXz0xwvx6a/PLDbk5it7bVZbED7zV0Jd0izli26nXs5taXBiWlso m3NmRpxGb87KKFE9T+fF7xdv+OHjv9okIKxh1jSvZznPdXb9YerV7jb0v/7r//T7R3ZIb5h/ 4HiwQNKFcsW8NQuVWIozEg21mIuKEwFPhJ1ahAIAAA== X-EOPAttributedMessage: 0 Received-SPF: Pass (protection.outlook.com: domain of sandisk.com designates 74.221.232.164 as permitted sender) receiver=protection.outlook.com; client-ip=74.221.232.164; helo=sacsmgep12.sandisk.com; Authentication-Results: spf=pass (sender IP is 74.221.232.164) smtp.mailfrom=Chaitanya.Huilgol@sandisk.com; X-Forefront-Antispam-Report: CIP:74.221.232.164; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10009020)(6009001)(438002)(189002)(374574003)(199003)(46406003)(81156004)(106466001)(16796002)(110136001)(76176999)(54356999)(23726002)(107886001)(2351001)(55846006)(46102003)(33656002)(69596002)(19580405001)(19580395003)(47776003)(64706001)(97756001)(85806002)(86362001)(2950100001)(2900100001)(2920100001)(2940100001)(97736003)(50466002)(87936001)(229853001)(77096005)(50986999)(68736005)(102836002)(92566002)(77156002)(2656002)(450100001)(62966003)(7099025)(2910100001); DIR:OUT; SFP:1101; SCL:1; SRVR:BY2PR02MB123; H:sacsmgep12.sandisk.com; FPR:; SPF:Pass; MLV:ovrnspm; PTR:InfoDomainNonexistent; MX:1; A:1; LANG:en; X-DmarcAction-Test: None X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:(3005004);SRVR:BY2PR02MB123; X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004); SRVR:BY2PR02MB123; X-Forefront-PRVS: 04631F8F77 X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:;SRVR:BY2PR02MB123; X-OriginatorOrg: sandisk.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Jan 2015 18:32:39.4647 (UTC) X-MS-Exchange-CrossTenant-Id: fcd9ea9c-ae8c-460c-ab3c-3db42d7ac64d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=fcd9ea9c-ae8c-460c-ab3c-3db42d7ac64d; Ip=[74.221.232.164] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR02MB123 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chaitanya Huilgol ceph: rbd option listing and tcp_nodelay support Option keys supported by libceph and rbd modules is readable as a comma separated string via /sys/bus/rbd/options read-only interface. This will allow user app (rbd cli) to check for supported option keys before passing options to the kernel and remain compatible with older kernels which do not support a particular feature. Messenger specific options moved to messenger layer. tcp_nodelay(default)/no_tcp_nodelay option added for setting TCP_NODELAY on messenger socket connections. Covers both rbd and cephfs Signed-off-by: Chaitanya Huilgol --- drivers/block/rbd.c | 21 +++++++++++++++++ fs/ceph/super.c | 5 +++- include/linux/ceph/libceph.h | 5 ++-- include/linux/ceph/messenger.h | 26 +++++++++++++++++++-- net/ceph/ceph_common.c | 52 ++++++++++++++++++++++++++++++++++++++---- net/ceph/messenger.c | 33 ++++++++++++++++++++++----- 6 files changed, 126 insertions(+), 16 deletions(-) -- 1.9.1 diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index e818c2a..507fd16 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -423,6 +423,7 @@ static ssize_t rbd_add_single_major(struct bus_type *bus, const char *buf, size_t count); static ssize_t rbd_remove_single_major(struct bus_type *bus, const char *buf, size_t count); +static ssize_t rbd_enumerate_options(struct bus_type *bus, char *buf); static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping); static void rbd_spec_put(struct rbd_spec *spec); @@ -440,12 +441,14 @@ static BUS_ATTR(add, S_IWUSR, NULL, rbd_add); static BUS_ATTR(remove, S_IWUSR, NULL, rbd_remove); static BUS_ATTR(add_single_major, S_IWUSR, NULL, rbd_add_single_major); static BUS_ATTR(remove_single_major, S_IWUSR, NULL, rbd_remove_single_major); +static BUS_ATTR(options, S_IRUSR, rbd_enumerate_options, NULL); static struct attribute *rbd_bus_attrs[] = { &bus_attr_add.attr, &bus_attr_remove.attr, &bus_attr_add_single_major.attr, &bus_attr_remove_single_major.attr, + &bus_attr_options.attr, NULL, }; @@ -746,6 +749,12 @@ static match_table_t rbd_opts_tokens = { {-1, NULL} }; +/* + * Supported options comma separated string. Readable by the rbd cli, so that + * an informed decision can be made on passing options to the kernel modules. + */ +static const char *rbd_supported_option_keys = "rw"; + struct rbd_options { bool read_only; }; @@ -5569,6 +5578,18 @@ static ssize_t rbd_remove_single_major(struct bus_type *bus, return do_rbd_remove(bus, buf, count); } +static ssize_t rbd_enumerate_options(struct bus_type *bus, + char *buf) +{ + ssize_t sz; + sz = snprintf(buf, PAGE_SIZE, "%s", rbd_supported_option_keys); + if ((sz + 1) < PAGE_SIZE) { + sz += snprintf (buf + sz, PAGE_SIZE - sz, ",%s", + ceph_get_supported_options()); + } + sz += 1; /* '0' String Termination */ + return sz; +} /* * create control files in sysfs * /sys/bus/rbd/... diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 50f06cd..4632ae4 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -423,7 +423,10 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root) seq_printf(m, ",fsid=%pU", &opt->fsid); if (opt->flags & CEPH_OPT_NOSHARE) seq_puts(m, ",noshare"); - if (opt->flags & CEPH_OPT_NOCRC) + if (ceph_test_msgr_opt(&opt->msgr_options, + CEPH_MSGR_OPT_NO_TCP_NODELAY)) + seq_puts(m, ",no_tcp_nodelay"); + if (ceph_test_msgr_opt(&opt->msgr_options, CEPH_MSGR_OPT_NOCRC)) seq_puts(m, ",nocrc"); if (opt->name) diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h index 8b11a79..9306a47 100644 --- a/include/linux/ceph/libceph.h +++ b/include/linux/ceph/libceph.h @@ -28,8 +28,7 @@ #define CEPH_OPT_FSID (1<<0) #define CEPH_OPT_NOSHARE (1<<1) /* don't share client with other sbs */ #define CEPH_OPT_MYIP (1<<2) /* specified my ip */ -#define CEPH_OPT_NOCRC (1<<3) /* no data crc on writes */ -#define CEPH_OPT_NOMSGAUTH (1<<4) /* not require cephx message signature */ +#define CEPH_OPT_NOMSGAUTH (1<<3) /* not require cephx message signature */ #define CEPH_OPT_DEFAULT (0) @@ -42,6 +41,7 @@ struct ceph_options { int flags; struct ceph_fsid fsid; struct ceph_entity_addr my_addr; + struct ceph_messenger_options msgr_options; int mount_timeout; int osd_idle_ttl; int osd_keepalive_timeout; @@ -190,6 +190,7 @@ extern struct ceph_options *ceph_parse_options(char *options, const char *dev_name, const char *dev_name_end, int (*parse_extra_token)(char *c, void *private), void *private); +extern const char* ceph_get_supported_options(void); extern void ceph_destroy_options(struct ceph_options *opt); extern int ceph_compare_options(struct ceph_options *new_opt, struct ceph_client *client); diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index d9d396c..471f622 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -51,12 +51,34 @@ struct ceph_connection_operations { /* use format string %s%d */ #define ENTITY_NAME(n) ceph_entity_type_name((n).type), le64_to_cpu((n).num) +/* + * Messenger specific ceph options + */ +struct ceph_messenger_options { + u32 flags; +}; + +#define CEPH_MSGR_OPT_NOCRC (1<<0) /* no data crc on writes */ +#define CEPH_MSGR_OPT_NO_TCP_NODELAY (1<<1) /* No TCP_NODELAY on con sock */ +#define CEPH_MSGR_OPT_DEFAULT (0) + +#define ceph_messenger_options_init(_msgr_opts) \ + ((_msgr_opts)->flags = CEPH_MSGR_OPT_DEFAULT) + +#define ceph_set_msgr_opt(_msgr_opts, _opt) \ + ((_msgr_opts)->flags |= _opt) +#define ceph_clr_msgr_opt(_msgr_opts, _opt) \ + ((_msgr_opts)->flags &= ~(_opt)) +#define ceph_test_msgr_opt(_msgr_opts, _opt) \ + (!!((_msgr_opts)->flags & (_opt))) + + struct ceph_messenger { struct ceph_entity_inst inst; /* my name+address */ struct ceph_entity_addr my_enc_addr; atomic_t stopping; - bool nocrc; + struct ceph_messenger_options *options; /* * the global_seq counts connections i (attempt to) initiate @@ -264,7 +286,7 @@ extern void ceph_messenger_init(struct ceph_messenger *msgr, struct ceph_entity_addr *myaddr, u64 supported_features, u64 required_features, - bool nocrc); + struct ceph_messenger_options *msgr_options); extern void ceph_con_init(struct ceph_connection *con, void *private, const struct ceph_connection_operations *ops, diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c index 5d5ab67..25f1515 100644 --- a/net/ceph/ceph_common.c +++ b/net/ceph/ceph_common.c @@ -239,6 +239,8 @@ enum { Opt_nocrc, Opt_cephx_require_signatures, Opt_nocephx_require_signatures, + Opt_tcp_nodelay, + Opt_no_tcp_nodelay, }; static match_table_t opt_tokens = { @@ -259,8 +261,28 @@ static match_table_t opt_tokens = { {Opt_nocrc, "nocrc"}, {Opt_cephx_require_signatures, "cephx_require_signatures"}, {Opt_nocephx_require_signatures, "nocephx_require_signatures"}, + {Opt_tcp_nodelay, "tcp_nodelay"}, + {Opt_no_tcp_nodelay, "no_tcp_nodelay"}, {-1, NULL} }; +/* + * Supported option keys. Readable by the rbd cli, so that an informed + * decision can be made on passing options to the kernel modules. + */ +static const char *libceph_supported_options_keys = + "osdtimeout," + "osdkeepalive," + "mount_timeout," + "osd_idle_ttl," + "fsid," + "name," + "secret," + "key," + "ip," + "share," + "crc," + "cephx_require_signatures," + "tcp_nodelay"; void ceph_destroy_options(struct ceph_options *opt) { @@ -320,8 +342,7 @@ out: return err; } -struct ceph_options * -ceph_parse_options(char *options, const char *dev_name, +struct ceph_options * ceph_parse_options(char *options, const char *dev_name, const char *dev_name_end, int (*parse_extra_token)(char *c, void *private), void *private) @@ -350,6 +371,7 @@ ceph_parse_options(char *options, const char *dev_name, opt->osd_keepalive_timeout = CEPH_OSD_KEEPALIVE_DEFAULT; opt->mount_timeout = CEPH_MOUNT_TIMEOUT_DEFAULT; /* seconds */ opt->osd_idle_ttl = CEPH_OSD_IDLE_TTL_DEFAULT; /* seconds */ + ceph_messenger_options_init(&opt->msgr_options); /* get mon ip(s) */ /* ip1[:port1][,ip2[:port2]...] */ @@ -452,11 +474,14 @@ ceph_parse_options(char *options, const char *dev_name, break; case Opt_crc: - opt->flags &= ~CEPH_OPT_NOCRC; + ceph_clr_msgr_opt(&opt->msgr_options, + CEPH_MSGR_OPT_NOCRC); break; case Opt_nocrc: - opt->flags |= CEPH_OPT_NOCRC; + ceph_set_msgr_opt(&opt->msgr_options, + CEPH_MSGR_OPT_NOCRC); break; + case Opt_cephx_require_signatures: opt->flags &= ~CEPH_OPT_NOMSGAUTH; break; @@ -464,6 +489,15 @@ ceph_parse_options(char *options, const char *dev_name, opt->flags |= CEPH_OPT_NOMSGAUTH; break; + case Opt_tcp_nodelay: + ceph_clr_msgr_opt(&opt->msgr_options, + CEPH_MSGR_OPT_NO_TCP_NODELAY); + break; + case Opt_no_tcp_nodelay: + ceph_set_msgr_opt(&opt->msgr_options, + CEPH_MSGR_OPT_NO_TCP_NODELAY); + break; + default: BUG_ON(token); } @@ -478,6 +512,14 @@ out: } EXPORT_SYMBOL(ceph_parse_options); + +const char* ceph_get_supported_options(void) +{ + return libceph_supported_options_keys; +} +EXPORT_SYMBOL(ceph_get_supported_options); + + u64 ceph_client_id(struct ceph_client *client) { return client->monc.auth->global_id; @@ -521,7 +563,7 @@ struct ceph_client *ceph_create_client(struct ceph_options *opt, void *private, ceph_messenger_init(&client->msgr, myaddr, client->supported_features, client->required_features, - ceph_test_opt(client, NOCRC)); + &opt->msgr_options); /* subsystems */ err = ceph_monc_init(&client->monc, client); diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 33a2f20..9a056fe 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -469,6 +469,21 @@ static void set_sock_callbacks(struct socket *sock, /* * socket helpers */ +static void ceph_tcp_set_sock_options(struct ceph_connection *con) +{ + int rc; + + if (!ceph_test_msgr_opt(con->msgr->options, + CEPH_MSGR_OPT_NO_TCP_NODELAY)) { + /* Not requested to disable TCP_NODELAY, set it by default */ + int optval = 1; + rc = kernel_setsockopt(con->sock, IPPROTO_TCP, TCP_NODELAY, + (char *)&optval, sizeof(optval)); + if (rc != 0) { + pr_warn("Warn: CEPH_CON_OPT: TCP_NODELAY: Fails=%d\n", rc); + } + } +} /* * initiate connection to a remote socket. @@ -513,6 +528,9 @@ static int ceph_tcp_connect(struct ceph_connection *con) sk_set_memalloc(sock->sk); con->sock = sock; + /* process socket options if any */ + ceph_tcp_set_sock_options(con); + return 0; } @@ -749,7 +767,6 @@ void ceph_con_init(struct ceph_connection *con, void *private, } EXPORT_SYMBOL(ceph_con_init); - /* * We maintain a global counter to order connection attempts. Get * a unique seq greater than @gt. @@ -1511,7 +1528,8 @@ static int write_partial_message_data(struct ceph_connection *con) { struct ceph_msg *msg = con->out_msg; struct ceph_msg_data_cursor *cursor = &msg->cursor; - bool do_datacrc = !con->msgr->nocrc; + bool do_datacrc = !ceph_test_msgr_opt(con->msgr->options, + CEPH_MSGR_OPT_NOCRC); u32 crc; dout("%s %p msg %p\n", __func__, con, msg); @@ -2212,7 +2230,8 @@ static int read_partial_msg_data(struct ceph_connection *con) { struct ceph_msg *msg = con->in_msg; struct ceph_msg_data_cursor *cursor = &msg->cursor; - const bool do_datacrc = !con->msgr->nocrc; + const bool do_datacrc = !ceph_test_msgr_opt(con->msgr->options, + CEPH_MSGR_OPT_NOCRC); struct page *page; size_t page_offset; size_t length; @@ -2258,7 +2277,8 @@ static int read_partial_message(struct ceph_connection *con) int end; int ret; unsigned int front_len, middle_len, data_len; - bool do_datacrc = !con->msgr->nocrc; + bool do_datacrc = !ceph_test_msgr_opt(con->msgr->options, + CEPH_MSGR_OPT_NOCRC); bool need_sign = (con->peer_features & CEPH_FEATURE_MSG_AUTH); u64 seq; u32 crc; @@ -2922,7 +2942,7 @@ void ceph_messenger_init(struct ceph_messenger *msgr, struct ceph_entity_addr *myaddr, u64 supported_features, u64 required_features, - bool nocrc) + struct ceph_messenger_options *msgr_options) { msgr->supported_features = supported_features; msgr->required_features = required_features; @@ -2936,7 +2956,8 @@ void ceph_messenger_init(struct ceph_messenger *msgr, msgr->inst.addr.type = 0; get_random_bytes(&msgr->inst.addr.nonce, sizeof(msgr->inst.addr.nonce)); encode_my_addr(msgr); - msgr->nocrc = nocrc; + BUG_ON(msgr_options == NULL); + msgr->options = msgr_options; atomic_set(&msgr->stopping, 0);