Message ID | 1435671955-9744-5-git-send-email-kaike.wan@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Jun 30, 2015 at 09:45:55AM -0400, kaike.wan@intel.com wrote: > +static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr *nlh) > +{ > + const struct nlattr *head, *curr; > + int len, rem; > + > + if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR) > + return 0; > + > + if (!(nlh->nlmsg_flags & RDMA_NL_LS_F_OK)) > + return 0; > + > + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec))) > + return 0; Um, why are you sending patches that are not even compile tested? drivers/infiniband/core/sa_query.c: In function 'ib_nl_is_good_resolve_resp': drivers/infiniband/core/sa_query.c:732:45: error: 'rec' undeclared (first use in this function) if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec))) This routine should also be using nla_parse and friends, not a roll your own. And you didn't address all the comments either.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Sent: Tuesday, June 30, 2015 6:25 PM > To: Wan, Kaike > Cc: linux-rdma@vger.kernel.org; Fleck, John; Weiny, Ira > Subject: Re: [PATCH v7 4/4] IB/sa: Route SA pathrecord query through netlink > > On Tue, Jun 30, 2015 at 09:45:55AM -0400, kaike.wan@intel.com wrote: > > > +static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr > > +*nlh) { > > + const struct nlattr *head, *curr; > > + int len, rem; > > + > > + if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR) > > + return 0; > > + > > + if (!(nlh->nlmsg_flags & RDMA_NL_LS_F_OK)) > > + return 0; > > + > > + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec))) > > + return 0; > > Um, why are you sending patches that are not even compile tested? My bad. I had a debugging patch on top of these patches and I built them together to do the tests (rdma_cm, ipoib, srp), but forgot to compile the rest after popping off the debugging patch. > > drivers/infiniband/core/sa_query.c: In function 'ib_nl_is_good_resolve_resp': > drivers/infiniband/core/sa_query.c:732:45: error: 'rec' undeclared (first use > in this function) > if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec))) > > This routine should also be using nla_parse and friends, not a roll your own. Yes, I can use nla_find() for this purpose here. > > And you didn't address all the comments either.. I addressed your comments with questions in an previous reply, but did not hear from you. See http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25614.html Thanks, Kaike > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 30, 2015 at 09:45:55AM -0400, kaike.wan@intel.com wrote: I still think this is long and rambly, but that is mostly a style preference, I think you should look at it and remove some of the left over stuff, like we really don't need a rough in for other responses at this point. > +#define IB_SA_ENABLE_LOCAL_SERVICE 0x00000001 > +#define IB_SA_CANCEL 0x00000002 > + > +#define IB_SA_LOCAL_SVC_ENABLED(query) \ > + ((query)->flags & IB_SA_ENABLE_LOCAL_SERVICE) > +#define IB_SA_ENABLE_LOCAL_SVC(query) \ > + ((query)->flags |= IB_SA_ENABLE_LOCAL_SERVICE) > +#define IB_SA_DISABLE_LOCAL_SVC(query) \ > + ((query)->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE) > + > +#define IB_SA_QUERY_CANCELLED(query) \ > + ((query)->flags & IB_SA_CANCEL) > +#define IB_SA_CANCEL_QUERY(query) \ > + ((query)->flags |= IB_SA_CANCEL) This whole bit is really strange style - if you really want to keep this then at least use static inline functions not macros. > +static struct ib_nl_request_info * > +ib_nl_alloc_request(struct ib_sa_query *query) > +{ > + struct ib_nl_request_info *rinfo; > + > + rinfo = kzalloc(sizeof(*rinfo), GFP_ATOMIC); > + if (rinfo == NULL) > + return ERR_PTR(-ENOMEM); There really seem to be alot of kallocs going on for what is supposed to be a performance path. I would probably work to fold this into the ib_sa_query allocation, it is just a few bytes we can waste that ram if we are not using netlink. > + if ((info->comp_mask & IB_SA_PATH_REC_REVERSIBLE) && > + sa_rec->reversible != 0) { > + if ((info->comp_mask & IB_SA_PATH_REC_NUMB_PATH) && > + sa_rec->numb_path > 1) > + val8 = LS_NLA_PATH_USE_ALL; > + else > + val8 = LS_NLA_PATH_USE_GMP; Drop the num_paths stuff, just set USE_GMP, it is confusing. I thought I mentioned this already. > + } else { > + val8 = LS_NLA_PATH_USE_UNIDIRECTIONAL; > + } > + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PATH_USE, sizeof(val8), > + &val8); Non-optional attributes should probably go in a non-nested header, possibly along with the portGUID/portnum/whatever. So the structure would be: nlmsghdr struct rdma_ls_resolve_path { u64 port_guid; // whatever u32 path_use; } nlattr[IB_SA_PATH_REC_PKEY,...]* This is standard layout for netlink messages > +static int ib_nl_get_path_rec_attrs_len(struct ib_nl_attr_info *info) > +{ > + /* > + * We need path use attribute no matter reversible or numb_path is > + * set or not, as long as some other fields get set > + */ > + if (WARN_ON(len == 0)) > + return len; The comment is obsolete, and it shouldn't exit without reserving space for the mandatory fields. > +static int ib_nl_send_request(struct ib_nl_request_info *rinfo) > +{ > + struct ib_nl_attr_info info; > + int opcode; > + struct ib_sa_mad *mad; > + unsigned long flags; > + unsigned long delay; > + int ret; > + > + mad = rinfo->query->mad_buf->mad; > + switch (mad->mad_hdr.attr_id) { > + case cpu_to_be16(IB_SA_ATTR_PATH_REC): > + opcode = RDMA_NL_LS_OP_RESOLVE; > + mad = rinfo->query->mad_buf->mad; > + info.comp_mask = mad->sa_hdr.comp_mask; > + info.input = rinfo->query->mad_buf->context[1]; > + rinfo->query->mad_buf->context[1] = NULL; > + info.len = ib_nl_get_path_rec_attrs_len(&info); > + info.set_attrs = ib_nl_set_path_rec_attrs; > + break; So now we put a bunch of stuff in yet another structure and call through a function pointer. Rambly, I'd streamline that.. > + struct ib_mad_send_wc mad_send_wc; > + struct ib_sa_mad *mad = NULL; > + const struct nlattr *head, *curr; > + struct ib_path_rec_data *rec; > + int len, rem; > + > + if (query->callback) { > + head = (const struct nlattr *) nlmsg_data(nlh); > + len = nlmsg_len(nlh); > + nla_for_each_attr(curr, head, len, rem) { > + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) { > + rec = nla_data(curr); > + if (rec->flags && IB_PATH_PRIMARY) { This is still wrong. I looked very closely, and it turns out the record you want to find depends on the path use that was asked for. LS_NLA_PATH_USE_ALL: IB_PATH_PRIMARY | IB_PATH_GMP | IB_PATH_BIDIRECTIONAL LS_NLA_PATH_USE_GMP: IB_PATH_PRIMARY | IB_PATH_GMP | IB_PATH_BIDIRECTIONAL LS_NLA_PATH_USE_UNIDIRECTIONAL IB_PATH_PRIMARY | IB_PATH_OUTBOUND > +static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr *nlh) > +{ > + const struct nlattr *head, *curr; > + int len, rem; > + > + if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR) > + return 0; > + > + if (!(nlh->nlmsg_flags & RDMA_NL_LS_F_OK)) > + return 0; > + > + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec))) > + return 0; > + > + head = (const struct nlattr *) nlmsg_data(nlh); > + len = nlmsg_len(nlh); > + nla_for_each_attr(curr, head, len, rem) { > + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) > + return 1; > + } As discussed already, this needs to use nla_parse_nested, which should eliminate all of this. Do not do nla_for_each_attr here, just look for ERR. > +static int ib_nl_handle_set_timeout(struct sk_buff *skb, > + struct netlink_callback *cb) > +{ > + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh; > + int timeout, delta, abs_delta; > + const struct nlattr *attr; > + struct rdma_nla_ls_timeout *to_attr; > + unsigned long flags; > + struct ib_nl_request_info *rinfo; > + long delay = 0; > + > + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*to_attr))) > + goto settimeout_out; All this should be driven by nla_parse > + attr = (const struct nlattr *) nlmsg_data(nlh); > + if (attr->nla_type != LS_NLA_TYPE_TIMEOUT || > + nla_len(attr) != sizeof(*to_attr)) > + goto settimeout_out; No nested attr, as discussed There is something weird here, IIRC in netlink a SET should return back exactly the same message with the up to date values. (Probably should confirm, I'm not 100% on that) And I don't think this should be a dump, again, not 100% sure. I didn't check the locking or a few otherthings, but I did test it out a bit with a custom cache userspace client, would like to try again when the protocol is fixed up. Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Sent: Friday, July 03, 2015 5:38 PM > To: Wan, Kaike > Cc: linux-rdma@vger.kernel.org; Fleck, John; Weiny, Ira > Subject: Re: [PATCH v7 4/4] IB/sa: Route SA pathrecord query through netlink > > On Tue, Jun 30, 2015 at 09:45:55AM -0400, kaike.wan@intel.com wrote: > > I still think this is long and rambly, but that is mostly a style preference, I > think you should look at it and remove some of the left over stuff, like we > really don't need a rough in for other responses at this point. I will try to remove some of the placeholders to simplify the code. > > > +#define IB_SA_ENABLE_LOCAL_SERVICE 0x00000001 > > +#define IB_SA_CANCEL 0x00000002 > > + > > +#define IB_SA_LOCAL_SVC_ENABLED(query) \ > > + ((query)->flags & IB_SA_ENABLE_LOCAL_SERVICE) #define > > +IB_SA_ENABLE_LOCAL_SVC(query) \ > > + ((query)->flags |= IB_SA_ENABLE_LOCAL_SERVICE) #define > > +IB_SA_DISABLE_LOCAL_SVC(query) \ > > + ((query)->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE) > > + > > +#define IB_SA_QUERY_CANCELLED(query) \ > > + ((query)->flags & IB_SA_CANCEL) > > +#define IB_SA_CANCEL_QUERY(query) \ > > + ((query)->flags |= IB_SA_CANCEL) > > This whole bit is really strange style - if you really want to keep this then at > least use static inline functions not macros. Will be changed to use static inline functions. > > > > +static struct ib_nl_request_info * > > +ib_nl_alloc_request(struct ib_sa_query *query) { > > + struct ib_nl_request_info *rinfo; > > + > > + rinfo = kzalloc(sizeof(*rinfo), GFP_ATOMIC); > > + if (rinfo == NULL) > > + return ERR_PTR(-ENOMEM); > > There really seem to be alot of kallocs going on for what is supposed to be a > performance path. > > I would probably work to fold this into the ib_sa_query allocation, it is just a > few bytes we can waste that ram if we are not using netlink. Will do. > > > + if ((info->comp_mask & IB_SA_PATH_REC_REVERSIBLE) && > > + sa_rec->reversible != 0) { > > + if ((info->comp_mask & IB_SA_PATH_REC_NUMB_PATH) && > > + sa_rec->numb_path > 1) > > + val8 = LS_NLA_PATH_USE_ALL; > > + else > > + val8 = LS_NLA_PATH_USE_GMP; > > Drop the num_paths stuff, just set USE_GMP, it is confusing. I thought I > mentioned this already. Done. > > > + } else { > > + val8 = LS_NLA_PATH_USE_UNIDIRECTIONAL; > > + } > > > + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PATH_USE, > sizeof(val8), > > + &val8); > > Non-optional attributes should probably go in a non-nested header, possibly > along with the portGUID/portnum/whatever. > > So the structure would be: > > nlmsghdr > struct rdma_ls_resolve_path > { > u64 port_guid; // whatever > u32 path_use; > } > nlattr[IB_SA_PATH_REC_PKEY,...]* > > This is standard layout for netlink messages That would be the Family header: struct rdma_ls_resolve_header { u8 device_name[64]; u8 port_num; u8 port_use; }; > > > +static int ib_nl_get_path_rec_attrs_len(struct ib_nl_attr_info *info) > > +{ > > + /* > > + * We need path use attribute no matter reversible or numb_path is > > + * set or not, as long as some other fields get set > > + */ > > + if (WARN_ON(len == 0)) > > + return len; > > The comment is obsolete, and it shouldn't exit without reserving space for > the mandatory fields. Here is a check to make sure that at least some of the mandatory comp_mask bits are set. If no bits are set (len == 0), the value of 0 is returned and the caller will abort the send. > > > +static int ib_nl_send_request(struct ib_nl_request_info *rinfo) { > > + struct ib_nl_attr_info info; > > + int opcode; > > + struct ib_sa_mad *mad; > > + unsigned long flags; > > + unsigned long delay; > > + int ret; > > + > > + mad = rinfo->query->mad_buf->mad; > > + switch (mad->mad_hdr.attr_id) { > > + case cpu_to_be16(IB_SA_ATTR_PATH_REC): > > + opcode = RDMA_NL_LS_OP_RESOLVE; > > + mad = rinfo->query->mad_buf->mad; > > + info.comp_mask = mad->sa_hdr.comp_mask; > > + info.input = rinfo->query->mad_buf->context[1]; > > + rinfo->query->mad_buf->context[1] = NULL; > > + info.len = ib_nl_get_path_rec_attrs_len(&info); > > + info.set_attrs = ib_nl_set_path_rec_attrs; > > + break; > > So now we put a bunch of stuff in yet another structure and call through a > function pointer. Rambly, I'd streamline that.. Done. > > > + struct ib_mad_send_wc mad_send_wc; > > + struct ib_sa_mad *mad = NULL; > > + const struct nlattr *head, *curr; > > + struct ib_path_rec_data *rec; > > + int len, rem; > > + > > + if (query->callback) { > > + head = (const struct nlattr *) nlmsg_data(nlh); > > + len = nlmsg_len(nlh); > > + nla_for_each_attr(curr, head, len, rem) { > > + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) { > > + rec = nla_data(curr); > > + if (rec->flags && IB_PATH_PRIMARY) { > > This is still wrong. > > I looked very closely, and it turns out the record you want to find depends on > the path use that was asked for. > > LS_NLA_PATH_USE_ALL: IB_PATH_PRIMARY | IB_PATH_GMP | > IB_PATH_BIDIRECTIONAL > LS_NLA_PATH_USE_GMP: IB_PATH_PRIMARY | IB_PATH_GMP | > IB_PATH_BIDIRECTIONAL > LS_NLA_PATH_USE_UNIDIRECTIONAL: > IB_PATH_PRIMARY | IB_PATH_OUTBOUND Good to know that. This info will be used in next revision to search for the desired pathrecord. > > > +static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr > > +*nlh) { > > + const struct nlattr *head, *curr; > > + int len, rem; > > + > > + if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR) > > + return 0; > > + > > + if (!(nlh->nlmsg_flags & RDMA_NL_LS_F_OK)) > > + return 0; > > + > > + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec))) > > + return 0; > > + > > + head = (const struct nlattr *) nlmsg_data(nlh); > > + len = nlmsg_len(nlh); > > + nla_for_each_attr(curr, head, len, rem) { > > + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) > > + return 1; > > + } > > As discussed already, this needs to use nla_parse_nested, which should > eliminate all of this. Do not do nla_for_each_attr here, just look for ERR. This function is removed since it now checks for ERR only. > > +static int ib_nl_handle_set_timeout(struct sk_buff *skb, > > + struct netlink_callback *cb) > > +{ > > + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh; > > + int timeout, delta, abs_delta; > > + const struct nlattr *attr; > > + struct rdma_nla_ls_timeout *to_attr; > > + unsigned long flags; > > + struct ib_nl_request_info *rinfo; > > + long delay = 0; > > + > > + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*to_attr))) > > + goto settimeout_out; > > All this should be driven by nla_parse Changed. > > > + attr = (const struct nlattr *) nlmsg_data(nlh); > > + if (attr->nla_type != LS_NLA_TYPE_TIMEOUT || > > + nla_len(attr) != sizeof(*to_attr)) > > + goto settimeout_out; > > No nested attr, as discussed Changed. > > There is something weird here, IIRC in netlink a SET should return back > exactly the same message with the up to date values. (Probably should > confirm, I'm not 100% on that) Not sure what you meant here. This is just a SET request from the user application. Where does the "return back" come from? > > And I don't think this should be a dump, again, not 100% sure. Special check is now done in ibnl_rcv_msg to bypass the dump mechanism for SET_TIMEOUT request. Kaike > > I didn't check the locking or a few otherthings, but I did test it out a bit with a > custom cache userspace client, would like to try again when the protocol is > fixed up. > > Thanks, > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 59022fa..bbc8537 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -45,12 +45,21 @@ #include <uapi/linux/if_ether.h> #include <rdma/ib_pack.h> #include <rdma/ib_cache.h> +#include <rdma/rdma_netlink.h> +#include <net/netlink.h> +#include <uapi/rdma/ib_user_sa.h> +#include <rdma/ib_marshall.h> #include "sa.h" MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("InfiniBand subnet administration query support"); MODULE_LICENSE("Dual BSD/GPL"); +#define IB_SA_LOCAL_SVC_TIMEOUT_MIN 100 +#define IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT 2000 +#define IB_SA_LOCAL_SVC_TIMEOUT_MAX 200000 +static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT; + struct ib_sa_sm_ah { struct ib_ah *ah; struct kref ref; @@ -80,8 +89,24 @@ struct ib_sa_query { struct ib_mad_send_buf *mad_buf; struct ib_sa_sm_ah *sm_ah; int id; + u32 flags; }; +#define IB_SA_ENABLE_LOCAL_SERVICE 0x00000001 +#define IB_SA_CANCEL 0x00000002 + +#define IB_SA_LOCAL_SVC_ENABLED(query) \ + ((query)->flags & IB_SA_ENABLE_LOCAL_SERVICE) +#define IB_SA_ENABLE_LOCAL_SVC(query) \ + ((query)->flags |= IB_SA_ENABLE_LOCAL_SERVICE) +#define IB_SA_DISABLE_LOCAL_SVC(query) \ + ((query)->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE) + +#define IB_SA_QUERY_CANCELLED(query) \ + ((query)->flags & IB_SA_CANCEL) +#define IB_SA_CANCEL_QUERY(query) \ + ((query)->flags |= IB_SA_CANCEL) + struct ib_sa_service_query { void (*callback)(int, struct ib_sa_service_rec *, void *); void *context; @@ -106,6 +131,26 @@ struct ib_sa_mcmember_query { struct ib_sa_query sa_query; }; +struct ib_nl_request_info { + struct list_head list; + u32 seq; + unsigned long timeout; + struct ib_sa_query *query; +}; + +struct ib_nl_attr_info { + u16 len; /* Total attr len: header + payload + padding */ + ib_sa_comp_mask comp_mask; + void *input; + void (*set_attrs)(struct sk_buff *skb, struct ib_nl_attr_info *info); +}; + +static LIST_HEAD(ib_nl_request_list); +static DEFINE_SPINLOCK(ib_nl_request_lock); +static atomic_t ib_nl_sa_request_seq; +static struct workqueue_struct *ib_nl_wq; +static struct delayed_work ib_nl_timed_work; + static void ib_sa_add_one(struct ib_device *device); static void ib_sa_remove_one(struct ib_device *device); @@ -381,6 +426,443 @@ static const struct ib_field guidinfo_rec_table[] = { .size_bits = 512 }, }; +static int ib_nl_send_msg(int opcode, struct ib_nl_attr_info *attrs, u32 seq) +{ + struct sk_buff *skb = NULL; + struct nlmsghdr *nlh; + void *data; + int ret = 0; + + if (attrs->len <= 0) + return -EMSGSIZE; + + skb = nlmsg_new(attrs->len, GFP_KERNEL); + if (!skb) + return -ENOMEM; + + /* Put nlmsg header only for now */ + data = ibnl_put_msg(skb, &nlh, seq, 0, RDMA_NL_SA, + opcode, GFP_KERNEL); + if (!data) { + kfree_skb(skb); + return -EMSGSIZE; + } + + /* Add attributes */ + attrs->set_attrs(skb, attrs); + + /* Repair the nlmsg header length */ + nlmsg_end(skb, nlh); + + ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL); + if (!ret) + ret = attrs->len; + else + ret = 0; + + return ret; +} + +static struct ib_nl_request_info * +ib_nl_alloc_request(struct ib_sa_query *query) +{ + struct ib_nl_request_info *rinfo; + + rinfo = kzalloc(sizeof(*rinfo), GFP_ATOMIC); + if (rinfo == NULL) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(&rinfo->list); + rinfo->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq); + rinfo->query = query; + + return rinfo; +} + +static void ib_nl_set_path_rec_attrs(struct sk_buff *skb, + struct ib_nl_attr_info *info) +{ + struct ib_sa_path_rec *sa_rec = info->input; + u8 val8; + u16 val16; + u64 val64; + + if (info->comp_mask & IB_SA_PATH_REC_SERVICE_ID) { + val64 = be64_to_cpu(sa_rec->service_id); + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SERVICE_ID, + sizeof(val64), &val64); + } + if (info->comp_mask & IB_SA_PATH_REC_DGID) + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_DGID, + sizeof(sa_rec->dgid), &sa_rec->dgid); + if (info->comp_mask & IB_SA_PATH_REC_SGID) + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SGID, + sizeof(sa_rec->sgid), &sa_rec->sgid); + if (info->comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS) + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_TCLASS, + sizeof(sa_rec->traffic_class), &sa_rec->traffic_class); + + if ((info->comp_mask & IB_SA_PATH_REC_REVERSIBLE) && + sa_rec->reversible != 0) { + if ((info->comp_mask & IB_SA_PATH_REC_NUMB_PATH) && + sa_rec->numb_path > 1) + val8 = LS_NLA_PATH_USE_ALL; + else + val8 = LS_NLA_PATH_USE_GMP; + } else { + val8 = LS_NLA_PATH_USE_UNIDIRECTIONAL; + } + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PATH_USE, sizeof(val8), + &val8); + + if (info->comp_mask & IB_SA_PATH_REC_PKEY) { + val16 = be16_to_cpu(sa_rec->pkey); + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PKEY, + sizeof(val16), &val16); + } + if (info->comp_mask & IB_SA_PATH_REC_QOS_CLASS) { + val16 = be16_to_cpu(sa_rec->qos_class); + nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_QOS_CLASS, + sizeof(val16), &val16); + } +} + +static int ib_nl_get_path_rec_attrs_len(struct ib_nl_attr_info *info) +{ + int len = 0; + + if (info->comp_mask & IB_SA_PATH_REC_SERVICE_ID) + len += nla_total_size(sizeof(struct rdma_nla_ls_service_id)); + if (info->comp_mask & IB_SA_PATH_REC_DGID) + len += nla_total_size(sizeof(struct rdma_nla_ls_gid)); + if (info->comp_mask & IB_SA_PATH_REC_SGID) + len += nla_total_size(sizeof(struct rdma_nla_ls_gid)); + if (info->comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS) + len += nla_total_size(sizeof(struct rdma_nla_ls_tclass)); + if (info->comp_mask & IB_SA_PATH_REC_PKEY) + len += nla_total_size(sizeof(struct rdma_nla_ls_pkey)); + if (info->comp_mask & IB_SA_PATH_REC_QOS_CLASS) + len += nla_total_size(sizeof(struct rdma_nla_ls_qos_class)); + + /* + * We need path use attribute no matter reversible or numb_path is + * set or not, as long as some other fields get set + */ + if (WARN_ON(len == 0)) + return len; + len += nla_total_size(sizeof(struct rdma_nla_ls_path_use)); + + return len; +} + +static int ib_nl_send_request(struct ib_nl_request_info *rinfo) +{ + struct ib_nl_attr_info info; + int opcode; + struct ib_sa_mad *mad; + unsigned long flags; + unsigned long delay; + int ret; + + mad = rinfo->query->mad_buf->mad; + switch (mad->mad_hdr.attr_id) { + case cpu_to_be16(IB_SA_ATTR_PATH_REC): + opcode = RDMA_NL_LS_OP_RESOLVE; + mad = rinfo->query->mad_buf->mad; + info.comp_mask = mad->sa_hdr.comp_mask; + info.input = rinfo->query->mad_buf->context[1]; + rinfo->query->mad_buf->context[1] = NULL; + info.len = ib_nl_get_path_rec_attrs_len(&info); + info.set_attrs = ib_nl_set_path_rec_attrs; + break; + default: + return -EINVAL; + } + + spin_lock_irqsave(&ib_nl_request_lock, flags); + ret = ib_nl_send_msg(opcode, &info, rinfo->seq); + if (ret <= 0) { + ret = -EIO; + goto request_out; + } else { + ret = 0; + } + + delay = msecs_to_jiffies(sa_local_svc_timeout_ms); + rinfo->timeout = delay + jiffies; + list_add_tail(&rinfo->list, &ib_nl_request_list); + /* Start the timeout if this is the only request */ + if (ib_nl_request_list.next == &rinfo->list) + queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay); + +request_out: + spin_unlock_irqrestore(&ib_nl_request_lock, flags); + + return ret; +} + +static int ib_nl_make_request(struct ib_sa_query *query) +{ + struct ib_nl_request_info *rinfo; + int ret; + + rinfo = ib_nl_alloc_request(query); + if (IS_ERR(rinfo)) + return -ENOMEM; + + ret = ib_nl_send_request(rinfo); + if (ret) + kfree(rinfo); + + return ret; +} + +static int ib_nl_cancel_request(struct ib_sa_query *query) +{ + unsigned long flags; + struct ib_nl_request_info *rinfo; + int found = 0; + + spin_lock_irqsave(&ib_nl_request_lock, flags); + list_for_each_entry(rinfo, &ib_nl_request_list, list) { + /* Let the timeout to take care of the callback */ + if (query == rinfo->query) { + IB_SA_CANCEL_QUERY(query); + rinfo->timeout = jiffies; + list_move(&rinfo->list, &ib_nl_request_list); + found = 1; + mod_delayed_work(ib_nl_wq, &ib_nl_timed_work, 1); + break; + } + } + spin_unlock_irqrestore(&ib_nl_request_lock, flags); + + return found; +} + +static void send_handler(struct ib_mad_agent *agent, + struct ib_mad_send_wc *mad_send_wc); + +static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query, + const struct nlmsghdr *nlh) +{ + struct ib_mad_send_wc mad_send_wc; + struct ib_sa_mad *mad = NULL; + const struct nlattr *head, *curr; + struct ib_path_rec_data *rec; + int len, rem; + + if (query->callback) { + head = (const struct nlattr *) nlmsg_data(nlh); + len = nlmsg_len(nlh); + nla_for_each_attr(curr, head, len, rem) { + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) { + rec = nla_data(curr); + if (rec->flags && IB_PATH_PRIMARY) { + mad = query->mad_buf->mad; + mad->mad_hdr.method |= + IB_MGMT_METHOD_RESP; + memcpy(mad->data, rec->path_rec, + sizeof(rec->path_rec)); + query->callback(query, 0, mad); + break; + } + } + } + } + + mad_send_wc.send_buf = query->mad_buf; + mad_send_wc.status = IB_WC_SUCCESS; + send_handler(query->mad_buf->mad_agent, &mad_send_wc); +} + +static void ib_nl_request_timeout(struct work_struct *work) +{ + unsigned long flags; + struct ib_nl_request_info *rinfo; + struct ib_sa_query *query; + unsigned long delay; + struct ib_mad_send_wc mad_send_wc; + int ret; + + spin_lock_irqsave(&ib_nl_request_lock, flags); + while (!list_empty(&ib_nl_request_list)) { + rinfo = list_entry(ib_nl_request_list.next, + struct ib_nl_request_info, list); + + if (time_after(rinfo->timeout, jiffies)) { + delay = rinfo->timeout - jiffies; + if ((long)delay <= 0) + delay = 1; + queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay); + break; + } + + list_del(&rinfo->list); + query = rinfo->query; + IB_SA_DISABLE_LOCAL_SVC(query); + /* Hold the lock to protect against query cancellation */ + if (IB_SA_QUERY_CANCELLED(query)) + ret = -1; + else + ret = ib_post_send_mad(query->mad_buf, NULL); + if (ret) { + mad_send_wc.send_buf = query->mad_buf; + mad_send_wc.status = IB_WC_WR_FLUSH_ERR; + spin_unlock_irqrestore(&ib_nl_request_lock, flags); + send_handler(query->port->agent, &mad_send_wc); + spin_lock_irqsave(&ib_nl_request_lock, flags); + } + kfree(rinfo); + } + spin_unlock_irqrestore(&ib_nl_request_lock, flags); +} + +static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr *nlh) +{ + const struct nlattr *head, *curr; + int len, rem; + + if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR) + return 0; + + if (!(nlh->nlmsg_flags & RDMA_NL_LS_F_OK)) + return 0; + + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec))) + return 0; + + head = (const struct nlattr *) nlmsg_data(nlh); + len = nlmsg_len(nlh); + nla_for_each_attr(curr, head, len, rem) { + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) + return 1; + } + + return 0; +} + +static int ib_nl_handle_set_timeout(struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh; + int timeout, delta, abs_delta; + const struct nlattr *attr; + struct rdma_nla_ls_timeout *to_attr; + unsigned long flags; + struct ib_nl_request_info *rinfo; + long delay = 0; + + if (nlmsg_len(nlh) < nla_attr_size(sizeof(*to_attr))) + goto settimeout_out; + + attr = (const struct nlattr *) nlmsg_data(nlh); + if (attr->nla_type != LS_NLA_TYPE_TIMEOUT || + nla_len(attr) != sizeof(*to_attr)) + goto settimeout_out; + + to_attr = (struct rdma_nla_ls_timeout *) nla_data(attr); + timeout = (int) to_attr->timeout; + if (timeout < IB_SA_LOCAL_SVC_TIMEOUT_MIN) + timeout = IB_SA_LOCAL_SVC_TIMEOUT_MIN; + if (timeout > IB_SA_LOCAL_SVC_TIMEOUT_MAX) + timeout = IB_SA_LOCAL_SVC_TIMEOUT_MAX; + + delta = timeout - sa_local_svc_timeout_ms; + if (delta < 0) + abs_delta = -delta; + else + abs_delta = delta; + + if (delta != 0) { + spin_lock_irqsave(&ib_nl_request_lock, flags); + sa_local_svc_timeout_ms = timeout; + list_for_each_entry(rinfo, &ib_nl_request_list, list) { + if (delta < 0 && abs_delta > rinfo->timeout) + rinfo->timeout = 0; + else + rinfo->timeout += delta; + + /* Get the new delay from the first entry */ + if (!delay) { + delay = rinfo->timeout - jiffies; + if (delay <= 0) + delay = 1; + } + } + if (delay) + mod_delayed_work(ib_nl_wq, &ib_nl_timed_work, + (unsigned long)delay); + spin_unlock_irqrestore(&ib_nl_request_lock, flags); + } + +settimeout_out: + return skb->len; +} + +static int ib_nl_handle_resolve_resp(struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh; + unsigned long flags; + struct ib_nl_request_info *rinfo; + struct ib_sa_query *query; + struct ib_mad_send_buf *send_buf; + struct ib_mad_send_wc mad_send_wc; + int found = 0; + int ret; + + spin_lock_irqsave(&ib_nl_request_lock, flags); + list_for_each_entry(rinfo, &ib_nl_request_list, list) { + /* + * If the query is cancelled, let the timeout routine + * take care of it. + */ + if (nlh->nlmsg_seq == rinfo->seq) { + found = !IB_SA_QUERY_CANCELLED(rinfo->query); + if (found) + list_del(&rinfo->list); + break; + } + } + + if (!found) { + spin_unlock_irqrestore(&ib_nl_request_lock, flags); + goto resp_out; + } + + query = rinfo->query; + send_buf = query->mad_buf; + + if (!ib_nl_is_good_resolve_resp(nlh)) { + /* if the result is a failure, send out the packet via IB */ + IB_SA_DISABLE_LOCAL_SVC(query); + ret = ib_post_send_mad(query->mad_buf, NULL); + spin_unlock_irqrestore(&ib_nl_request_lock, flags); + if (ret) { + mad_send_wc.send_buf = send_buf; + mad_send_wc.status = IB_WC_GENERAL_ERR; + send_handler(query->port->agent, &mad_send_wc); + } + } else { + spin_unlock_irqrestore(&ib_nl_request_lock, flags); + ib_nl_process_good_resolve_rsp(query, nlh); + } + + kfree(rinfo); +resp_out: + return skb->len; +} + +static struct ibnl_client_cbs ib_sa_cb_table[] = { + [RDMA_NL_LS_OP_RESOLVE] = { + .dump = ib_nl_handle_resolve_resp, + .module = THIS_MODULE }, + [RDMA_NL_LS_OP_SET_TIMEOUT] = { + .dump = ib_nl_handle_set_timeout, + .module = THIS_MODULE }, +}; + static void free_sm_ah(struct kref *kref) { struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref); @@ -502,7 +984,13 @@ void ib_sa_cancel_query(int id, struct ib_sa_query *query) mad_buf = query->mad_buf; spin_unlock_irqrestore(&idr_lock, flags); - ib_cancel_mad(agent, mad_buf); + /* + * If the query is still on the netlink request list, schedule + * it to be cancelled by the timeout routine. Otherwise, it has been + * sent to the MAD layer and has to be cancelled from there. + */ + if (!ib_nl_cancel_request(query)) + ib_cancel_mad(agent, mad_buf); } EXPORT_SYMBOL(ib_sa_cancel_query); @@ -638,6 +1126,14 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) query->mad_buf->context[0] = query; query->id = id; + if (IB_SA_LOCAL_SVC_ENABLED(query)) { + if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) { + if (!ib_nl_make_request(query)) + return id; + } + IB_SA_DISABLE_LOCAL_SVC(query); + } + ret = ib_post_send_mad(query->mad_buf, NULL); if (ret) { spin_lock_irqsave(&idr_lock, flags); @@ -766,6 +1262,9 @@ int ib_sa_path_rec_get(struct ib_sa_client *client, *sa_query = &query->sa_query; + IB_SA_ENABLE_LOCAL_SVC(&query->sa_query); + query->sa_query.mad_buf->context[1] = rec; + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -1254,6 +1753,8 @@ static int __init ib_sa_init(void) get_random_bytes(&tid, sizeof tid); + atomic_set(&ib_nl_sa_request_seq, 0); + ret = ib_register_client(&sa_client); if (ret) { printk(KERN_ERR "Couldn't register ib_sa client\n"); @@ -1266,7 +1767,25 @@ static int __init ib_sa_init(void) goto err2; } + ib_nl_wq = create_singlethread_workqueue("ib_nl_sa_wq"); + if (!ib_nl_wq) { + ret = -ENOMEM; + goto err3; + } + + if (ibnl_add_client(RDMA_NL_SA, RDMA_NL_LS_NUM_OPS, + ib_sa_cb_table)) { + pr_err("Failed to add netlink callback\n"); + ret = -EINVAL; + goto err4; + } + INIT_DELAYED_WORK(&ib_nl_timed_work, ib_nl_request_timeout); + return 0; +err4: + destroy_workqueue(ib_nl_wq); +err3: + mcast_cleanup(); err2: ib_unregister_client(&sa_client); err1: @@ -1275,6 +1794,10 @@ err1: static void __exit ib_sa_cleanup(void) { + ibnl_remove_client(RDMA_NL_SA); + cancel_delayed_work(&ib_nl_timed_work); + flush_workqueue(ib_nl_wq); + destroy_workqueue(ib_nl_wq); mcast_cleanup(); ib_unregister_client(&sa_client); idr_destroy(&query_idr);