Message ID | 1515152434-13105-2-git-send-email-devesh.sharma@broadcom.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Doug Ledford |
Headers | show |
On Fri, 2018-01-05 at 06:40 -0500, Devesh Sharma wrote: > From: Selvin Xavier <selvin.xavier@broadcom.com> > > Currently, fifty percent of the total available resources > are reserved for PF and remaining are equally divided among > active VFs. > > +/* > + * Percentage of resources of each type reserved for PF. > + * Remaining resources are divided equally among VFs. > + * [0, 100] > + */ > +#define BNXT_RE_PCT_RSVD_FOR_PF 50 This is a separate comment from the patch review itself. But, are you sure this is a good idea? And especially are you sure that it should be a compile time constant and not a runtime parameter? I ask because it seems to me that usage of this stuff falls into one of two categories: 1) All bare metal usage 2) SRIOV usage (in which case the bare metal OS does relatively little, the SRIOV using clients do most of the work) I guess I'm finding it hard to imagine a scenario where, when you do have SRIOV VFs, that you don't want the majority of all resources being used there. I might suggest that you simply don't split resources at all. Maybe do something like filesystems do. Let anyone at all take a resource until you hit 95% utilization then only root can write to the filesystem. In this case it would be let both PFs and VFs use resources at will up until you hit the 95% utilization threshold and then restrict resource use to the PF. That would make much more sense to me.
On Tue, Jan 9, 2018 at 3:42 AM, Doug Ledford <dledford@redhat.com> wrote: > On Fri, 2018-01-05 at 06:40 -0500, Devesh Sharma wrote: >> From: Selvin Xavier <selvin.xavier@broadcom.com> >> >> Currently, fifty percent of the total available resources >> are reserved for PF and remaining are equally divided among >> active VFs. >> >> +/* >> + * Percentage of resources of each type reserved for PF. >> + * Remaining resources are divided equally among VFs. >> + * [0, 100] >> + */ >> +#define BNXT_RE_PCT_RSVD_FOR_PF 50 > > This is a separate comment from the patch review itself. But, are you > sure this is a good idea? And especially are you sure that it should be > a compile time constant and not a runtime parameter? > Keeping a compile time constant is indeed not a good idea and I completely understand that if we have had a knob there it would had been much much better and flexible. For this submission we wanted to avoid the use of module-parameter or configfs interface. Thus, as a workaround this is hard-coded compile time constant is used. Eventually, more flexible scheme would be supplied to change this. > I ask because it seems to me that usage of this stuff falls into one of > two categories: > > 1) All bare metal usage > 2) SRIOV usage (in which case the bare metal OS does relatively little, > the SRIOV using clients do most of the work) > > I guess I'm finding it hard to imagine a scenario where, when you do > have SRIOV VFs, that you don't want the majority of all resources being > used there. > > I might suggest that you simply don't split resources at all. Maybe do > something like filesystems do. Let anyone at all take a resource until > you hit 95% utilization then only root can write to the filesystem. In > this case it would be let both PFs and VFs use resources at will up > until you hit the 95% utilization threshold and then restrict resource > use to the PF. That would make much more sense to me. This is indeed an excellent suggestion to optimize the resource utilization between PFs and VFs, however, I have couple of facts to put forward - If I have understood it correctly then this would require an independent entity which would keep track of what percentage of resources has been utilized at any given point in time by all the functions (PF and its VFs). Currently, we do not have such implementation in firmware and PF driver cannot track or resource utilization across functions. - In the current implementation hard-coding 50% does not hard-limit PFs with 50% it can still over-subscribe upto max limit even though max VFs are active. - With the equal distribution of remaining resources among VFs we are trying to avail minimum guaranteed resources to max possible VFs on a given PF. We want to avoid the case where number of usable VFs depend on the current usage of resources consumed by already active VFs. > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2018-01-09 at 19:37 +0530, Devesh Sharma wrote: > On Tue, Jan 9, 2018 at 3:42 AM, Doug Ledford <dledford@redhat.com> wrote: > > On Fri, 2018-01-05 at 06:40 -0500, Devesh Sharma wrote: > > > From: Selvin Xavier <selvin.xavier@broadcom.com> > > > > > > Currently, fifty percent of the total available resources > > > are reserved for PF and remaining are equally divided among > > > active VFs. > > > > > > +/* > > > + * Percentage of resources of each type reserved for PF. > > > + * Remaining resources are divided equally among VFs. > > > + * [0, 100] > > > + */ > > > +#define BNXT_RE_PCT_RSVD_FOR_PF 50 > > > > This is a separate comment from the patch review itself. But, are you > > sure this is a good idea? And especially are you sure that it should be > > a compile time constant and not a runtime parameter? > > > > Keeping a compile time constant is indeed not a good idea and I completely > understand that if we have had a knob there it would had been much much > better and flexible. > For this submission we wanted to avoid the use of module-parameter or configfs > interface. Thus, as a workaround this is hard-coded compile time > constant is used. > Eventually, more flexible scheme would be supplied to change this. Ok. > > I ask because it seems to me that usage of this stuff falls into one of > > two categories: > > > > 1) All bare metal usage > > 2) SRIOV usage (in which case the bare metal OS does relatively little, > > the SRIOV using clients do most of the work) > > > > I guess I'm finding it hard to imagine a scenario where, when you do > > have SRIOV VFs, that you don't want the majority of all resources being > > used there. > > > > I might suggest that you simply don't split resources at all. Maybe do > > something like filesystems do. Let anyone at all take a resource until > > you hit 95% utilization then only root can write to the filesystem. In > > this case it would be let both PFs and VFs use resources at will up > > until you hit the 95% utilization threshold and then restrict resource > > use to the PF. That would make much more sense to me. > > This is indeed an excellent suggestion to optimize the resource > utilization between > PFs and VFs, however, I have couple of facts to put forward > > - If I have understood it correctly then this would require an > independent entity which > would keep track of what percentage of resources has been utilized > at any given > point in time by all the functions (PF and its VFs). Currently, we > do not have such > implementation in firmware and PF driver cannot track or resource > utilization across > functions. Fair enough. > - In the current implementation hard-coding 50% does not hard-limit PFs with 50% > it can still over-subscribe upto max limit even though max VFs are active. OK, but that means a PF can starve VFs rather easily I take it? > - With the equal distribution of remaining resources among VFs we are > trying to avail > minimum guaranteed resources to max possible VFs on a given PF. We > want to avoid > the case where number of usable VFs depend on the current usage of > resources consumed > by already active VFs. And this then is the opposite of the PF in that VFs aren't *really* guaranteed this minimum amount, since the PF can starve the VFs out, but it at least guarantees other VFs don't starve any specific VF out. That's fine if that's how you want things setup for now. I think I would work on a firmware update to implement the resource tracker as the long term solution ;-).
On Tue, Jan 9, 2018 at 8:36 PM, Doug Ledford <dledford@redhat.com> wrote: > On Tue, 2018-01-09 at 19:37 +0530, Devesh Sharma wrote: >> On Tue, Jan 9, 2018 at 3:42 AM, Doug Ledford <dledford@redhat.com> wrote: >> > On Fri, 2018-01-05 at 06:40 -0500, Devesh Sharma wrote: >> > > From: Selvin Xavier <selvin.xavier@broadcom.com> >> > > >> > > Currently, fifty percent of the total available resources >> > > are reserved for PF and remaining are equally divided among >> > > active VFs. >> > > >> > > +/* >> > > + * Percentage of resources of each type reserved for PF. >> > > + * Remaining resources are divided equally among VFs. >> > > + * [0, 100] >> > > + */ >> > > +#define BNXT_RE_PCT_RSVD_FOR_PF 50 >> > >> > This is a separate comment from the patch review itself. But, are you >> > sure this is a good idea? And especially are you sure that it should be >> > a compile time constant and not a runtime parameter? >> > >> >> Keeping a compile time constant is indeed not a good idea and I completely >> understand that if we have had a knob there it would had been much much >> better and flexible. >> For this submission we wanted to avoid the use of module-parameter or configfs >> interface. Thus, as a workaround this is hard-coded compile time >> constant is used. >> Eventually, more flexible scheme would be supplied to change this. > > Ok. > >> > I ask because it seems to me that usage of this stuff falls into one of >> > two categories: >> > >> > 1) All bare metal usage >> > 2) SRIOV usage (in which case the bare metal OS does relatively little, >> > the SRIOV using clients do most of the work) >> > >> > I guess I'm finding it hard to imagine a scenario where, when you do >> > have SRIOV VFs, that you don't want the majority of all resources being >> > used there. >> > >> > I might suggest that you simply don't split resources at all. Maybe do >> > something like filesystems do. Let anyone at all take a resource until >> > you hit 95% utilization then only root can write to the filesystem. In >> > this case it would be let both PFs and VFs use resources at will up >> > until you hit the 95% utilization threshold and then restrict resource >> > use to the PF. That would make much more sense to me. >> >> This is indeed an excellent suggestion to optimize the resource >> utilization between >> PFs and VFs, however, I have couple of facts to put forward >> >> - If I have understood it correctly then this would require an >> independent entity which >> would keep track of what percentage of resources has been utilized >> at any given >> point in time by all the functions (PF and its VFs). Currently, we >> do not have such >> implementation in firmware and PF driver cannot track or resource >> utilization across >> functions. > > Fair enough. > >> - In the current implementation hard-coding 50% does not hard-limit PFs with 50% >> it can still over-subscribe upto max limit even though max VFs are active. > > OK, but that means a PF can starve VFs rather easily I take it? Yeah it could, however for now this 50% means 64K are resources are there PF, so kind of less worried. For some deployments 64K may not be sufficient, VFs could starve in such deployments. > >> - With the equal distribution of remaining resources among VFs we are >> trying to avail >> minimum guaranteed resources to max possible VFs on a given PF. We >> want to avoid >> the case where number of usable VFs depend on the current usage of >> resources consumed >> by already active VFs. > > And this then is the opposite of the PF in that VFs aren't *really* > guaranteed this minimum amount, since the PF can starve the VFs out, but > it at least guarantees other VFs don't starve any specific VF out. Yes, true, I should rather re-phrase the previous bullet. It prevents VF starvation. > > That's fine if that's how you want things setup for now. I think I > would work on a firmware update to implement the resource tracker as the > long term solution ;-). I would take this feedback to the concerned people in our f/w team and see if this is possible to implement such an entity. > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h index ecbac91..b604277 100644 --- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h +++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h @@ -57,6 +57,19 @@ #define BNXT_RE_MAX_MRW_COUNT (64 * 1024) #define BNXT_RE_MAX_SRQC_COUNT (64 * 1024) #define BNXT_RE_MAX_CQ_COUNT (64 * 1024) +#define BNXT_RE_MAX_MRW_COUNT_64K (64 * 1024) +#define BNXT_RE_MAX_MRW_COUNT_256K (256 * 1024) + +/* Number of MRs to reserve for PF, leaving remainder for VFs */ +#define BNXT_RE_RESVD_MR_FOR_PF (32 * 1024) +#define BNXT_RE_MAX_GID_PER_VF 128 + +/* + * Percentage of resources of each type reserved for PF. + * Remaining resources are divided equally among VFs. + * [0, 100] + */ +#define BNXT_RE_PCT_RSVD_FOR_PF 50 #define BNXT_RE_UD_QP_HW_STALL 0x400000 @@ -145,6 +158,8 @@ struct bnxt_re_dev { struct bnxt_re_ah *sqp_ah; struct bnxt_re_sqp_entries sqp_tbl[1024]; atomic_t nq_alloc_cnt; + u32 is_virtfn; + u32 num_vfs; }; #define to_bnxt_re_dev(ptr, member) \ diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 2032db7..6dd2fe1 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -2295,10 +2295,14 @@ int bnxt_re_post_recv(struct ib_qp *ib_qp, struct ib_recv_wr *wr, /* Completion Queues */ int bnxt_re_destroy_cq(struct ib_cq *ib_cq) { - struct bnxt_re_cq *cq = container_of(ib_cq, struct bnxt_re_cq, ib_cq); - struct bnxt_re_dev *rdev = cq->rdev; int rc; - struct bnxt_qplib_nq *nq = cq->qplib_cq.nq; + struct bnxt_re_cq *cq; + struct bnxt_qplib_nq *nq; + struct bnxt_re_dev *rdev; + + cq = container_of(ib_cq, struct bnxt_re_cq, ib_cq); + rdev = cq->rdev; + nq = cq->qplib_cq.nq; rc = bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq); if (rc) { @@ -2308,12 +2312,11 @@ int bnxt_re_destroy_cq(struct ib_cq *ib_cq) if (!IS_ERR_OR_NULL(cq->umem)) ib_umem_release(cq->umem); - if (cq) { - kfree(cq->cql); - kfree(cq); - } atomic_dec(&rdev->cq_count); nq->budget--; + kfree(cq->cql); + kfree(cq); + return 0; } diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c index aafc19a..d825df0 100644 --- a/drivers/infiniband/hw/bnxt_re/main.c +++ b/drivers/infiniband/hw/bnxt_re/main.c @@ -80,6 +80,79 @@ static struct workqueue_struct *bnxt_re_wq; static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev, bool lock_wait); +/* SR-IOV helper functions */ + +static void bnxt_re_get_sriov_func_type(struct bnxt_re_dev *rdev) +{ + struct bnxt *bp; + + bp = netdev_priv(rdev->en_dev->net); + if (BNXT_VF(bp)) + rdev->is_virtfn = 1; +} + +/* Set the maximum number of each resource that the driver actually wants + * to allocate. This may be up to the maximum number the firmware has + * reserved for the function. The driver may choose to allocate fewer + * resources than the firmware maximum. + */ +static void bnxt_re_set_resource_limits(struct bnxt_re_dev *rdev) +{ + u32 vf_qps = 0, vf_srqs = 0, vf_cqs = 0, vf_mrws = 0, vf_gids = 0; + u32 i; + u32 vf_pct; + u32 num_vfs; + struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr; + + rdev->qplib_ctx.qpc_count = min_t(u32, BNXT_RE_MAX_QPC_COUNT, + dev_attr->max_qp); + + rdev->qplib_ctx.mrw_count = BNXT_RE_MAX_MRW_COUNT_256K; + /* Use max_mr from fw since max_mrw does not get set */ + rdev->qplib_ctx.mrw_count = min_t(u32, rdev->qplib_ctx.mrw_count, + dev_attr->max_mr); + rdev->qplib_ctx.srqc_count = min_t(u32, BNXT_RE_MAX_SRQC_COUNT, + dev_attr->max_srq); + rdev->qplib_ctx.cq_count = min_t(u32, BNXT_RE_MAX_CQ_COUNT, + dev_attr->max_cq); + + for (i = 0; i < MAX_TQM_ALLOC_REQ; i++) + rdev->qplib_ctx.tqm_count[i] = + rdev->dev_attr.tqm_alloc_reqs[i]; + + if (rdev->num_vfs) { + /* + * Reserve a set of resources for the PF. Divide the remaining + * resources among the VFs + */ + vf_pct = 100 - BNXT_RE_PCT_RSVD_FOR_PF; + num_vfs = 100 * rdev->num_vfs; + vf_qps = (rdev->qplib_ctx.qpc_count * vf_pct) / num_vfs; + vf_srqs = (rdev->qplib_ctx.srqc_count * vf_pct) / num_vfs; + vf_cqs = (rdev->qplib_ctx.cq_count * vf_pct) / num_vfs; + /* + * The driver allows many more MRs than other resources. If the + * firmware does also, then reserve a fixed amount for the PF + * and divide the rest among VFs. VFs may use many MRs for NFS + * mounts, ISER, NVME applications, etc. If the firmware + * severely restricts the number of MRs, then let PF have + * half and divide the rest among VFs, as for the other + * resource types. + */ + if (rdev->qplib_ctx.mrw_count < BNXT_RE_MAX_MRW_COUNT_64K) + vf_mrws = rdev->qplib_ctx.mrw_count * vf_pct / num_vfs; + else + vf_mrws = (rdev->qplib_ctx.mrw_count - + BNXT_RE_RESVD_MR_FOR_PF) / rdev->num_vfs; + vf_gids = BNXT_RE_MAX_GID_PER_VF; + } + rdev->qplib_ctx.vf_res.max_mrw_per_vf = vf_mrws; + rdev->qplib_ctx.vf_res.max_gid_per_vf = vf_gids; + rdev->qplib_ctx.vf_res.max_qp_per_vf = vf_qps; + rdev->qplib_ctx.vf_res.max_srq_per_vf = vf_srqs; + rdev->qplib_ctx.vf_res.max_cq_per_vf = vf_cqs; +} + /* for handling bnxt_en callbacks later */ static void bnxt_re_stop(void *p) { @@ -91,6 +164,15 @@ static void bnxt_re_start(void *p) static void bnxt_re_sriov_config(void *p, int num_vfs) { + struct bnxt_re_dev *rdev = p; + + if (!rdev) + return; + + rdev->num_vfs = num_vfs; + bnxt_re_set_resource_limits(rdev); + bnxt_qplib_set_func_resources(&rdev->qplib_res, &rdev->rcfw, + &rdev->qplib_ctx); } static void bnxt_re_shutdown(void *p) @@ -734,7 +816,8 @@ static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev) /* Configure and allocate resources for qplib */ rdev->qplib_res.rcfw = &rdev->rcfw; - rc = bnxt_qplib_get_dev_attr(&rdev->rcfw, &rdev->dev_attr); + rc = bnxt_qplib_get_dev_attr(&rdev->rcfw, &rdev->dev_attr, + rdev->is_virtfn); if (rc) goto fail; @@ -1035,19 +1118,6 @@ static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev, bool lock_wait) } } -static void bnxt_re_set_resource_limits(struct bnxt_re_dev *rdev) -{ - u32 i; - - rdev->qplib_ctx.qpc_count = BNXT_RE_MAX_QPC_COUNT; - rdev->qplib_ctx.mrw_count = BNXT_RE_MAX_MRW_COUNT; - rdev->qplib_ctx.srqc_count = BNXT_RE_MAX_SRQC_COUNT; - rdev->qplib_ctx.cq_count = BNXT_RE_MAX_CQ_COUNT; - for (i = 0; i < MAX_TQM_ALLOC_REQ; i++) - rdev->qplib_ctx.tqm_count[i] = - rdev->dev_attr.tqm_alloc_reqs[i]; -} - /* worker thread for polling periodic events. Now used for QoS programming*/ static void bnxt_re_worker(struct work_struct *work) { @@ -1070,6 +1140,9 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev) } set_bit(BNXT_RE_FLAG_NETDEV_REGISTERED, &rdev->flags); + /* Check whether VF or PF */ + bnxt_re_get_sriov_func_type(rdev); + rc = bnxt_re_request_msix(rdev); if (rc) { pr_err("Failed to get MSI-X vectors: %#x\n", rc); @@ -1101,16 +1174,18 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev) (rdev->en_dev->pdev, &rdev->rcfw, rdev->msix_entries[BNXT_RE_AEQ_IDX].vector, rdev->msix_entries[BNXT_RE_AEQ_IDX].db_offset, - 0, &bnxt_re_aeq_handler); + rdev->is_virtfn, &bnxt_re_aeq_handler); if (rc) { pr_err("Failed to enable RCFW channel: %#x\n", rc); goto free_ring; } - rc = bnxt_qplib_get_dev_attr(&rdev->rcfw, &rdev->dev_attr); + rc = bnxt_qplib_get_dev_attr(&rdev->rcfw, &rdev->dev_attr, + rdev->is_virtfn); if (rc) goto disable_rcfw; - bnxt_re_set_resource_limits(rdev); + if (!rdev->is_virtfn) + bnxt_re_set_resource_limits(rdev); rc = bnxt_qplib_alloc_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx, 0); if (rc) { @@ -1125,7 +1200,8 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev) goto free_ctx; } - rc = bnxt_qplib_init_rcfw(&rdev->rcfw, &rdev->qplib_ctx, 0); + rc = bnxt_qplib_init_rcfw(&rdev->rcfw, &rdev->qplib_ctx, + rdev->is_virtfn); if (rc) { pr_err("Failed to initialize RCFW: %#x\n", rc); goto free_sctx; @@ -1144,13 +1220,15 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev) goto fail; } - rc = bnxt_re_setup_qos(rdev); - if (rc) - pr_info("RoCE priority not yet configured\n"); + if (!rdev->is_virtfn) { + rc = bnxt_re_setup_qos(rdev); + if (rc) + pr_info("RoCE priority not yet configured\n"); - INIT_DELAYED_WORK(&rdev->worker, bnxt_re_worker); - set_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags); - schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000)); + INIT_DELAYED_WORK(&rdev->worker, bnxt_re_worker); + set_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags); + schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000)); + } /* Register ib dev */ rc = bnxt_re_register_ib(rdev); @@ -1400,7 +1478,7 @@ static int __init bnxt_re_mod_init(void) static void __exit bnxt_re_mod_exit(void) { - struct bnxt_re_dev *rdev; + struct bnxt_re_dev *rdev, *next; LIST_HEAD(to_be_deleted); mutex_lock(&bnxt_re_dev_lock); @@ -1408,8 +1486,11 @@ static void __exit bnxt_re_mod_exit(void) if (!list_empty(&bnxt_re_dev_list)) list_splice_init(&bnxt_re_dev_list, &to_be_deleted); mutex_unlock(&bnxt_re_dev_lock); - - list_for_each_entry(rdev, &to_be_deleted, list) { + /* + * Cleanup the devices in reverse order so that the VF device + * cleanup is done before PF cleanup + */ + list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) { dev_info(rdev_to_dev(rdev), "Unregistering Device"); bnxt_re_dev_stop(rdev); bnxt_re_ib_unreg(rdev, true); diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c index 9543ce5..f5450b7 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c @@ -65,7 +65,7 @@ static bool bnxt_qplib_is_atomic_cap(struct bnxt_qplib_rcfw *rcfw) } int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw, - struct bnxt_qplib_dev_attr *attr) + struct bnxt_qplib_dev_attr *attr, bool vf) { struct cmdq_query_func req; struct creq_query_func_resp resp; @@ -95,7 +95,8 @@ int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw, /* Extract the context from the side buffer */ attr->max_qp = le32_to_cpu(sb->max_qp); /* max_qp value reported by FW for PF doesn't include the QP1 for PF */ - attr->max_qp += 1; + if (!vf) + attr->max_qp += 1; attr->max_qp_rd_atom = sb->max_qp_rd_atom > BNXT_QPLIB_MAX_OUT_RD_ATOM ? BNXT_QPLIB_MAX_OUT_RD_ATOM : sb->max_qp_rd_atom; @@ -150,6 +151,38 @@ int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw, return rc; } +int bnxt_qplib_set_func_resources(struct bnxt_qplib_res *res, + struct bnxt_qplib_rcfw *rcfw, + struct bnxt_qplib_ctx *ctx) +{ + struct cmdq_set_func_resources req; + struct creq_set_func_resources_resp resp; + u16 cmd_flags = 0; + int rc = 0; + + RCFW_CMD_PREP(req, SET_FUNC_RESOURCES, cmd_flags); + + req.number_of_qp = cpu_to_le32(ctx->qpc_count); + req.number_of_mrw = cpu_to_le32(ctx->mrw_count); + req.number_of_srq = cpu_to_le32(ctx->srqc_count); + req.number_of_cq = cpu_to_le32(ctx->cq_count); + + req.max_qp_per_vf = cpu_to_le32(ctx->vf_res.max_qp_per_vf); + req.max_mrw_per_vf = cpu_to_le32(ctx->vf_res.max_mrw_per_vf); + req.max_srq_per_vf = cpu_to_le32(ctx->vf_res.max_srq_per_vf); + req.max_cq_per_vf = cpu_to_le32(ctx->vf_res.max_cq_per_vf); + req.max_gid_per_vf = cpu_to_le32(ctx->vf_res.max_gid_per_vf); + + rc = bnxt_qplib_rcfw_send_message(rcfw, (void *)&req, + (void *)&resp, + NULL, 0); + if (rc) { + dev_err(&res->pdev->dev, + "QPLIB: Failed to set function resources"); + } + return rc; +} + /* SGID */ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res, struct bnxt_qplib_sgid_tbl *sgid_tbl, int index, diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h b/drivers/infiniband/hw/bnxt_re/qplib_sp.h index 1132258..51bb784 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h @@ -147,7 +147,10 @@ int bnxt_qplib_add_pkey(struct bnxt_qplib_res *res, struct bnxt_qplib_pkey_tbl *pkey_tbl, u16 *pkey, bool update); int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw, - struct bnxt_qplib_dev_attr *attr); + struct bnxt_qplib_dev_attr *attr, bool vf); +int bnxt_qplib_set_func_resources(struct bnxt_qplib_res *res, + struct bnxt_qplib_rcfw *rcfw, + struct bnxt_qplib_ctx *ctx); int bnxt_qplib_create_ah(struct bnxt_qplib_res *res, struct bnxt_qplib_ah *ah); int bnxt_qplib_destroy_ah(struct bnxt_qplib_res *res, struct bnxt_qplib_ah *ah); int bnxt_qplib_alloc_mrw(struct bnxt_qplib_res *res, diff --git a/drivers/infiniband/hw/bnxt_re/roce_hsi.h b/drivers/infiniband/hw/bnxt_re/roce_hsi.h index c3cba606..f429fdb 100644 --- a/drivers/infiniband/hw/bnxt_re/roce_hsi.h +++ b/drivers/infiniband/hw/bnxt_re/roce_hsi.h @@ -1799,6 +1799,16 @@ struct cmdq_set_func_resources { u8 resp_size; u8 reserved8; __le64 resp_addr; + __le32 number_of_qp; + __le32 number_of_mrw; + __le32 number_of_srq; + __le32 number_of_cq; + __le32 max_qp_per_vf; + __le32 max_mrw_per_vf; + __le32 max_srq_per_vf; + __le32 max_cq_per_vf; + __le32 max_gid_per_vf; + __le32 stat_ctx_id; }; /* Read hardware resource context command (24 bytes) */