From patchwork Fri Sep 26 08:39:13 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sreedhar Kodali X-Patchwork-Id: 4979221 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7FC349F402 for ; Fri, 26 Sep 2014 08:39:30 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 63424202E6 for ; Fri, 26 Sep 2014 08:39:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CD4D2202B8 for ; Fri, 26 Sep 2014 08:39:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754285AbaIZIjX (ORCPT ); Fri, 26 Sep 2014 04:39:23 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:56376 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753224AbaIZIjR (ORCPT ); Fri, 26 Sep 2014 04:39:17 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 26 Sep 2014 02:39:16 -0600 Received: from d01dlp01.pok.ibm.com (9.56.250.166) by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 26 Sep 2014 02:39:15 -0600 Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id B913838C803B for ; Fri, 26 Sep 2014 04:39:14 -0400 (EDT) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s8Q8dE4563897676 for ; Fri, 26 Sep 2014 08:39:14 GMT Received: from d01av04.pok.ibm.com (localhost [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s8Q8dEBw009324 for ; Fri, 26 Sep 2014 04:39:14 -0400 Received: from ltcweb.rtp.raleigh.ibm.com (ltcweb.rtp.raleigh.ibm.com [9.37.210.204]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s8Q8dEHs009321; Fri, 26 Sep 2014 04:39:14 -0400 Received: from ltc.linux.ibm.com (localhost.localdomain [127.0.0.1]) by ltcweb.rtp.raleigh.ibm.com (Postfix) with ESMTP id F0440C0103; Fri, 26 Sep 2014 04:39:13 -0400 (EDT) MIME-Version: 1.0 Date: Fri, 26 Sep 2014 14:09:13 +0530 From: Sreedhar Kodali To: sean.hefty@intel.com Cc: linux-rdma@vger.kernel.org, pradeeps@linux.vnet.ibm.com Subject: [PATCH v6] rsockets: fine grained interception mechanism for rsocket preloading Message-ID: <008e50f7c84829d33c8d71b9e192b890@imap.linux.ibm.com> X-Sender: srkodali@linux.vnet.ibm.com User-Agent: Roundcube Webmail/1.0.1 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14092608-9332-0000-0000-00000224FAEA Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Note: Minor modification to the latest change set from Sean Also attached the patch file for convenience From: Sreedhar Kodali Date: Fri Sep 26 12:29:00 2014 +0530 By default the R-Sockets pre-loading library intercepts all the stream and datagram sockets belonging to a launched program processes and threads. However, distributed application and database servers may require fine grained interception to ensure that only the processes which are listening for remote connections on the RDMA transport need to be enabled with RDMA while remaining can continue to use TCP as before. This allows proper communication happening between various server components locally. A configuration file based mechanism is introduced to facilitate this fine grained interception mechanism. As part of preload initialization, the configuration file is scanned and an in-memory record store is created with all the entries found. When a request is made to intercept a socket, its attributes are cross checked with stored records to see whether we should proceed with rsocket switch over. Note: Right now, the fine grained interception mechanism is enabled only for newly created sockets. Going forward, this can be extened to select connections based on the specified host/IP addresses and ports as well. "preload_config" is the name of the configuration file which should exist in the default configuration location (usually the full path to this configuration file is: /etc/rdma/rsocket/preload_config) of an installed rsocket library. The sample format for this configuration file is shown below: Signed-off-by: Sreedhar Kodali Reviewed-by: Pradeep Satyanarayana Signed-off-by: Sean Hefty --- I made several adjustments to the submitted patch. Please verify that these work for you. Changes from v5: - Simplified input file format slightly. - Removed typedef - Rename entryp and removed unneeded variable - Replaced token parsing with sscanf - Added wildcard support for the program name - Wildcard values now stored as 0 - Enhanced checks for domain, type, and protocol strings - Fixed realloc error handling - Simplified free_config implementation - Added support for app passing in 0 for protocol Additional minor change: - Compare only first n chars of program name +} + static int fd_open(void) { struct fd_info *fdi; @@ -308,6 +440,7 @@ static void init_preload(void) rs.fcntl = dlsym(RTLD_DEFAULT, "rfcntl"); getenv_options(); + scan_config(); init = 1; out: pthread_mutex_unlock(&mut); @@ -404,10 +537,11 @@ int socket(int domain, int type, int protocol) static __thread int recursive; int index, ret; - if (recursive) + init_preload(); + + if (recursive || !intercept_socket(domain, type, protocol)) goto real; - init_preload(); index = fd_open(); if (index < 0) return index; commit 5ad8c6892e56e7e7c8cd3a1ac5702568a4a3847e Author: Sreedhar Kodali Date: Fri Sep 26 12:29:00 2014 +0530 By default the R-Sockets pre-loading library intercepts all the stream and datagram sockets belonging to a launched program processes and threads. However, distributed application and database servers may require fine grained interception to ensure that only the processes which are listening for remote connections on the RDMA transport need to be enabled with RDMA while remaining can continue to use TCP as before. This allows proper communication happening between various server components locally. A configuration file based mechanism is introduced to facilitate this fine grained interception mechanism. As part of preload initialization, the configuration file is scanned and an in-memory record store is created with all the entries found. When a request is made to intercept a socket, its attributes are cross checked with stored records to see whether we should proceed with rsocket switch over. Note: Right now, the fine grained interception mechanism is enabled only for newly created sockets. Going forward, this can be extened to select connections based on the specified host/IP addresses and ports as well. "preload_config" is the name of the configuration file which should exist in the default configuration location (usually the full path to this configuration file is: /etc/rdma/rsocket/preload_config) of an installed rsocket library. The sample format for this configuration file is shown below: Signed-off-by: Sreedhar Kodali Reviewed-by: Pradeep Satyanarayana Signed-off-by: Sean Hefty --- I made several adjustments to the submitted patch. Please verify that these work for you. Changes from v5: - Simplified input file format slightly. - Removed typedef - Rename entryp and removed unneeded variable - Replaced token parsing with sscanf - Added wildcard support for the program name - Wildcard values now stored as 0 - Enhanced checks for domain, type, and protocol strings - Fixed realloc error handling - Simplified free_config implementation - Added support for app passing in 0 for protocol Additional minor change: - Compare only first n chars of program name diff --git a/src/preload.c b/src/preload.c index fb2149b..1e62a06 100644 --- a/src/preload.c +++ b/src/preload.c @@ -50,6 +50,9 @@ #include #include #include +#include +#include +#include #include #include @@ -122,6 +125,135 @@ struct fd_info { atomic_t refcnt; }; +struct config_entry { + char *name; + int domain; + int type; + int protocol; +}; + +static struct config_entry *config; +static int config_cnt; +extern char *program_invocation_short_name; + + +static void free_config(void) +{ + while (config_cnt) + free(config[--config_cnt].name); + + free(config); +} + +/* + * Config file format: + * # Starting '#' indicates comment + * # wild card values are supported using '*' + * # domain - *, INET, INET6, IB + * # type - *, STREAM, DGRAM + * # protocol - *, TCP, UDP + * program_name domain type protocol + */ +static void scan_config(void) +{ + struct config_entry *new_config; + FILE *fp; + char line[120], prog[64], dom[16], type[16], proto[16]; + + fp = fopen(RS_CONF_DIR "/preload_config", "r"); + if (!fp) + return; + + while (fgets(line, sizeof(line), fp)) { + if (line[0] == '#') + continue; + + if (sscanf(line, "%64s%16s%16s%16s", prog, dom, type, proto) != 4) + continue; + + new_config = realloc(config, (config_cnt + 1) * + sizeof(struct config_entry)); + if (!new_config) + break; + + config = new_config; + memset(&config[config_cnt], 0, sizeof(struct config_entry)); + + if (!strcasecmp(dom, "INET") || + !strcasecmp(dom, "AF_INET") || + !strcasecmp(dom, "PF_INET")) { + config[config_cnt].domain = AF_INET; + } else if (!strcasecmp(dom, "INET6") || + !strcasecmp(dom, "AF_INET6") || + !strcasecmp(dom, "PF_INET6")) { + config[config_cnt].domain = AF_INET6; + } else if (!strcasecmp(dom, "IB") || + !strcasecmp(dom, "AF_IB") || + !strcasecmp(dom, "PF_IB")) { + config[config_cnt].domain = AF_IB; + } else if (strcmp(dom, "*")) { + continue; + } + + if (!strcasecmp(type, "STREAM") || + !strcasecmp(type, "SOCK_STREAM")) { + config[config_cnt].type = SOCK_STREAM; + } else if (!strcasecmp(type, "DGRAM") || + !strcasecmp(type, "SOCK_DGRAM")) { + config[config_cnt].type = SOCK_DGRAM; + } else if (strcmp(type, "*")) { + continue; + } + + if (!strcasecmp(proto, "TCP") || + !strcasecmp(proto, "IPPROTO_TCP")) { + config[config_cnt].protocol = IPPROTO_TCP; + } else if (!strcasecmp(proto, "UDP") || + !strcasecmp(proto, "IPPROTO_UDP")) { + config[config_cnt].protocol = IPPROTO_UDP; + } else if (strcmp(proto, "*")) { + continue; + } + + if (strcmp(prog, "*")) { + if (!(config[config_cnt].name = strdup(prog))) + continue; + } + + config_cnt++; + } + + fclose(fp); + if (config_cnt) + atexit(free_config); +} + +static int intercept_socket(int domain, int type, int protocol) +{ + int i; + + if (!config_cnt) + return 1; + + if (!protocol) { + if (type == SOCK_STREAM) + protocol = IPPROTO_TCP; + else if (type == SOCK_DGRAM) + protocol = IPPROTO_UDP; + } + + for (i = 0; i < config_cnt; i++) { + if ((!config[i].name || + !strncasecmp(config[i].name, program_invocation_short_name, strlen(config[i].name))) && + (!config[i].domain || config[i].domain == domain) && + (!config[i].type || config[i].type == type) && + (!config[i].protocol || config[i].protocol == protocol)) + return 1; + } + + return 0; +} + static int fd_open(void) { struct fd_info *fdi; @@ -308,6 +440,7 @@ static void init_preload(void) rs.fcntl = dlsym(RTLD_DEFAULT, "rfcntl"); getenv_options(); + scan_config(); init = 1; out: pthread_mutex_unlock(&mut); @@ -404,10 +537,11 @@ int socket(int domain, int type, int protocol) static __thread int recursive; int index, ret; - if (recursive) + init_preload(); + + if (recursive || !intercept_socket(domain, type, protocol)) goto real; - init_preload(); index = fd_open(); if (index < 0) return index; diff --git a/src/preload.c b/src/preload.c index fb2149b..1e62a06 100644 --- a/src/preload.c +++ b/src/preload.c @@ -50,6 +50,9 @@ #include #include #include +#include +#include +#include #include #include @@ -122,6 +125,135 @@ struct fd_info { atomic_t refcnt; }; +struct config_entry { + char *name; + int domain; + int type; + int protocol; +}; + +static struct config_entry *config; +static int config_cnt; +extern char *program_invocation_short_name; + + +static void free_config(void) +{ + while (config_cnt) + free(config[--config_cnt].name); + + free(config); +} + +/* + * Config file format: + * # Starting '#' indicates comment + * # wild card values are supported using '*' + * # domain - *, INET, INET6, IB + * # type - *, STREAM, DGRAM + * # protocol - *, TCP, UDP + * program_name domain type protocol + */ +static void scan_config(void) +{ + struct config_entry *new_config; + FILE *fp; + char line[120], prog[64], dom[16], type[16], proto[16]; + + fp = fopen(RS_CONF_DIR "/preload_config", "r"); + if (!fp) + return; + + while (fgets(line, sizeof(line), fp)) { + if (line[0] == '#') + continue; + + if (sscanf(line, "%64s%16s%16s%16s", prog, dom, type, proto) != 4) + continue; + + new_config = realloc(config, (config_cnt + 1) * + sizeof(struct config_entry)); + if (!new_config) + break; + + config = new_config; + memset(&config[config_cnt], 0, sizeof(struct config_entry)); + + if (!strcasecmp(dom, "INET") || + !strcasecmp(dom, "AF_INET") || + !strcasecmp(dom, "PF_INET")) { + config[config_cnt].domain = AF_INET; + } else if (!strcasecmp(dom, "INET6") || + !strcasecmp(dom, "AF_INET6") || + !strcasecmp(dom, "PF_INET6")) { + config[config_cnt].domain = AF_INET6; + } else if (!strcasecmp(dom, "IB") || + !strcasecmp(dom, "AF_IB") || + !strcasecmp(dom, "PF_IB")) { + config[config_cnt].domain = AF_IB; + } else if (strcmp(dom, "*")) { + continue; + } + + if (!strcasecmp(type, "STREAM") || + !strcasecmp(type, "SOCK_STREAM")) { + config[config_cnt].type = SOCK_STREAM; + } else if (!strcasecmp(type, "DGRAM") || + !strcasecmp(type, "SOCK_DGRAM")) { + config[config_cnt].type = SOCK_DGRAM; + } else if (strcmp(type, "*")) { + continue; + } + + if (!strcasecmp(proto, "TCP") || + !strcasecmp(proto, "IPPROTO_TCP")) { + config[config_cnt].protocol = IPPROTO_TCP; + } else if (!strcasecmp(proto, "UDP") || + !strcasecmp(proto, "IPPROTO_UDP")) { + config[config_cnt].protocol = IPPROTO_UDP; + } else if (strcmp(proto, "*")) { + continue; + } + + if (strcmp(prog, "*")) { + if (!(config[config_cnt].name = strdup(prog))) + continue; + } + + config_cnt++; + } + + fclose(fp); + if (config_cnt) + atexit(free_config); +} + +static int intercept_socket(int domain, int type, int protocol) +{ + int i; + + if (!config_cnt) + return 1; + + if (!protocol) { + if (type == SOCK_STREAM) + protocol = IPPROTO_TCP; + else if (type == SOCK_DGRAM) + protocol = IPPROTO_UDP; + } + + for (i = 0; i < config_cnt; i++) { + if ((!config[i].name || + !strncasecmp(config[i].name, program_invocation_short_name, strlen(config[i].name))) && + (!config[i].domain || config[i].domain == domain) && + (!config[i].type || config[i].type == type) && + (!config[i].protocol || config[i].protocol == protocol)) + return 1; + } + + return 0;