diff mbox

SQ overflow seen running isert traffic with high block sizes

Message ID 1516269522.24576.274.camel@haakon3.daterainc.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Nicholas A. Bellinger Jan. 18, 2018, 9:58 a.m. UTC
Hi Shiraz, Michal & Co,

Thanks for the feedback.  Comments below.

On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > Sent: Monday, January 15, 2018 6:57 AM
> > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > 
> > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > 
> > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > target queue-full memory leak:
> > > 
> > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > 
> > > Just curious how frequent this happens in practice with sustained large block
> > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > qedr and iw_cxgb4)..?
> > > 
> > > Is there anything else from an iser-target consumer level that should be
> > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > 
> > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > Issue when running RoCE. It's not iWARP related. 
> > This is easily reproduced within seconds with IO size of 5121K
> > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > 
> > IO Command used:
> > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > 
> > thanks,
> > Michal
> 
> Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> rather quickly, in a matter of seconds.
> 
> fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> 

A couple of thoughts.

First, would it be helpful to limit maximum payload size per I/O for
consumers based on number of iser-target sq hw sges..?

That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
maximum payload size per I/O being too large there is an existing
target_core_fabric_ops mechanism for limiting using SCSI residuals,
originally utilized by qla2xxx here:

target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e

Note this patch also will return a smaller Block Limits VPD (0x86)
MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
automatically limit maximum outgoing payload transfer length, and avoid
SCSI residual logic.

As-is, iser-target doesn't a propagate max_data_sg_ents limit into
iscsi-target, but you can try testing with a smaller value to see if
it's useful.  Eg:


Second, if the failures are not SCSI transfer length specific, another
option would be to limit the total command sequence number depth (CmdSN)
per session.

This is controlled at runtime by default_cmdsn_depth TPG attribute:

/sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth

and on per initiator context with cmdsn_depth NodeACL attribute:

/sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth

Note these default to 64, and can be changed at build time via
include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH.

That said, Sagi, any further comments as what else iser-target should be
doing to avoid repeated queue-fulls with limited hw sges..?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Potnuri Bharat Teja Jan. 18, 2018, 5:53 p.m. UTC | #1
Hi Nicholas, 
thanks for the suggestions. Comments below.

On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote:
> Hi Shiraz, Michal & Co,
> 
> Thanks for the feedback.  Comments below.
> 
> On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> > > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > 
> > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > 
> > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > > target queue-full memory leak:
> > > > 
> > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > 
> > > > Just curious how frequent this happens in practice with sustained large block
> > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > > qedr and iw_cxgb4)..?
> > > > 
> > > > Is there anything else from an iser-target consumer level that should be
> > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > 
> > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > > Issue when running RoCE. It's not iWARP related. 
> > > This is easily reproduced within seconds with IO size of 5121K
> > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > > 
> > > IO Command used:
> > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > 
> > > thanks,
> > > Michal
> > 
> > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> > rather quickly, in a matter of seconds.
> > 
> > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> > 
> 
> A couple of thoughts.
> 
> First, would it be helpful to limit maximum payload size per I/O for
> consumers based on number of iser-target sq hw sges..?
yes, I think HW num sge needs to be propagated to iscsi target.
> 
> That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> maximum payload size per I/O being too large there is an existing

Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
256k and for READ tests only with chelsio(iw_cxgb4) adapters.
> target_core_fabric_ops mechanism for limiting using SCSI residuals,
> originally utilized by qla2xxx here:
> 
> target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e
> 
> Note this patch also will return a smaller Block Limits VPD (0x86)
> MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
> means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
> automatically limit maximum outgoing payload transfer length, and avoid
> SCSI residual logic.
> 
> As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> iscsi-target, but you can try testing with a smaller value to see if
> it's useful.  Eg:
> 
> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
> index 0ebc481..d8a4cc5 100644
> --- a/drivers/target/iscsi/iscsi_target_configfs.c
> +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
>         .module                         = THIS_MODULE,
>         .name                           = "iscsi",
>         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
>         .get_fabric_name                = iscsi_get_fabric_name,
>         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
>         .tpg_get_tag                    = lio_tpg_get_tag,
> 
With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16.
> Second, if the failures are not SCSI transfer length specific, another
> option would be to limit the total command sequence number depth (CmdSN)
> per session.
> 
> This is controlled at runtime by default_cmdsn_depth TPG attribute:
> 
> /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth
> 
> and on per initiator context with cmdsn_depth NodeACL attribute:
> 
> /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth
> 
> Note these default to 64, and can be changed at build time via
> include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH.
> 
> That said, Sagi, any further comments as what else iser-target should be
> doing to avoid repeated queue-fulls with limited hw sges..?
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kalderon, Michal Jan. 19, 2018, 7:33 p.m. UTC | #2

Saleem, Shiraz Jan. 22, 2018, 5:49 p.m. UTC | #3
> Subject: Re: SQ overflow seen running isert traffic with high block sizes

> 

> 

> First, would it be helpful to limit maximum payload size per I/O for consumers

> based on number of iser-target sq hw sges..?

> 

Assuming data is not able to be fast registered as if virtually contiguous;
artificially limiting the data size might not be the best solution.

But max SGEs does need to be exposed higher. Somewhere in the stack,
there might need to be multiple WRs submitted or data copied.

> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c

> b/drivers/target/iscsi/iscsi_target_configf

> index 0ebc481..d8a4cc5 100644

> --- a/drivers/target/iscsi/iscsi_target_configfs.c

> +++ b/drivers/target/iscsi/iscsi_target_configfs.c

> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)

>         .module                         = THIS_MODULE,

>         .name                           = "iscsi",

>         .node_acl_size                  = sizeof(struct iscsi_node_acl),

> +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM

> TRANSFER LENGTH */

>         .get_fabric_name                = iscsi_get_fabric_name,

>         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,

>         .tpg_get_tag                    = lio_tpg_get_tag,

> 


BTW, this is helping the SQ overflow issue.
Nicholas A. Bellinger Jan. 24, 2018, 7:25 a.m. UTC | #4
Hi Potnuri & Co,

On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote:
> Hi Nicholas, 
> thanks for the suggestions. Comments below.
> 
> On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote:
> > Hi Shiraz, Michal & Co,
> > 
> > Thanks for the feedback.  Comments below.
> > 
> > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> > > > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > > 
> > > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > > 
> > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > > > target queue-full memory leak:
> > > > > 
> > > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > > 
> > > > > Just curious how frequent this happens in practice with sustained large block
> > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > > > qedr and iw_cxgb4)..?
> > > > > 
> > > > > Is there anything else from an iser-target consumer level that should be
> > > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > > 
> > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > > > Issue when running RoCE. It's not iWARP related. 
> > > > This is easily reproduced within seconds with IO size of 5121K
> > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > > > 
> > > > IO Command used:
> > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > > 
> > > > thanks,
> > > > Michal
> > > 
> > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> > > rather quickly, in a matter of seconds.
> > > 
> > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> > > 
> > 
> > A couple of thoughts.
> > 
> > First, would it be helpful to limit maximum payload size per I/O for
> > consumers based on number of iser-target sq hw sges..?
> yes, I think HW num sge needs to be propagated to iscsi target.
> > 
> > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> > maximum payload size per I/O being too large there is an existing
> 
> Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
> 256k and for READ tests only with chelsio(iw_cxgb4) adapters.

Thanks for confirming.

> > target_core_fabric_ops mechanism for limiting using SCSI residuals,
> > originally utilized by qla2xxx here:
> > 
> > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e
> > 
> > Note this patch also will return a smaller Block Limits VPD (0x86)
> > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
> > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
> > automatically limit maximum outgoing payload transfer length, and avoid
> > SCSI residual logic.
> > 
> > As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> > iscsi-target, but you can try testing with a smaller value to see if
> > it's useful.  Eg:
> > 
> > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
> > index 0ebc481..d8a4cc5 100644
> > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
> >         .module                         = THIS_MODULE,
> >         .name                           = "iscsi",
> >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
> >         .get_fabric_name                = iscsi_get_fabric_name,
> >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> >         .tpg_get_tag                    = lio_tpg_get_tag,
> > 
> With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16.

OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages)
avoids SQ overflow with iw_cxgb4.

What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Jan. 24, 2018, 7:55 a.m. UTC | #5
Hi Michal & Co,

On Fri, 2018-01-19 at 19:33 +0000, Kalderon, Michal wrote:
> ________________________________________
> From: Nicholas A. Bellinger <nab@linux-iscsi.org>
> Sent: Thursday, January 18, 2018 11:58 AM
> 
> > Hi Shiraz, Michal & Co,
> Hi Nicholas, 
> 
> > Thanks for the feedback.  Comments below.
> 
> > A couple of thoughts.
> 
> > First, would it be helpful to limit maximum payload size per I/O for
> > consumers based on number of iser-target sq hw sges..?
> 
> I don't think you need to limit the maximum payload, but instead 
> initialize the max_wr to be based on the number of supported SGEs
> Instead of what is there today:
> #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
>                                 ISERT_MAX_TX_MISC_PDUS  + \
>                                 ISERT_MAX_RX_MISC_PDUS)
> Add the maximum number of WQEs per command, 
> The calculation of number of WQEs per command needs to be something like
> "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
> 

Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
right..?

Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS..

Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at
runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with
limited attr.cap.max_send_sge..?

> For some devices like ours, breaking the IO into multiple WRs according to supported
> number of SGEs doesn't necessarily means performance penalty.
> 

AFAICT ading max_data_sg_nents for iser-target is safe enough
work-around to include for stable, assuming we agree on what the
max_send_sg cut-off is for setting max_data_sg_nents=32 usage from a
larger default.  I don't have a string preference either way, as long as
it can be picked up for 4.x stable.

Sagi, WDYT..?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Jan. 24, 2018, 8:01 a.m. UTC | #6
Hi Shiraz & Co,

Thanks for the feedback.

On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote:
> > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > 
> > 
> > First, would it be helpful to limit maximum payload size per I/O for consumers
> > based on number of iser-target sq hw sges..?
> > 
> Assuming data is not able to be fast registered as if virtually contiguous;
> artificially limiting the data size might not be the best solution.
> 
> But max SGEs does need to be exposed higher. Somewhere in the stack,
> there might need to be multiple WRs submitted or data copied.
> 

Sagi..?

> > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> > b/drivers/target/iscsi/iscsi_target_configf
> > index 0ebc481..d8a4cc5 100644
> > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
> >         .module                         = THIS_MODULE,
> >         .name                           = "iscsi",
> >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM
> > TRANSFER LENGTH */
> >         .get_fabric_name                = iscsi_get_fabric_name,
> >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> >         .tpg_get_tag                    = lio_tpg_get_tag,
> > 
> 
> BTW, this is helping the SQ overflow issue.

Thanks for confirming as a possible work-around.

For reference, what is i40iw's max_send_sg reporting..?

Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER
LENGTH to avoid consistent SQ overflow as-is with i40iw..?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kalderon, Michal Jan. 24, 2018, 8:09 a.m. UTC | #7
> From: Nicholas A. Bellinger [mailto:nab@linux-iscsi.org]

> Sent: Wednesday, January 24, 2018 9:56 AM

> 

> Hi Michal & Co,

> 

> On Fri, 2018-01-19 at 19:33 +0000, Kalderon, Michal wrote:

> > ________________________________________

> > From: Nicholas A. Bellinger <nab@linux-iscsi.org>

> > Sent: Thursday, January 18, 2018 11:58 AM

> >

> > > Hi Shiraz, Michal & Co,

> > Hi Nicholas,

> >

> > > Thanks for the feedback.  Comments below.

> >

> > > A couple of thoughts.

> >

> > > First, would it be helpful to limit maximum payload size per I/O for

> > > consumers based on number of iser-target sq hw sges..?

> >

> > I don't think you need to limit the maximum payload, but instead

> > initialize the max_wr to be based on the number of supported SGEs

> > Instead of what is there today:

> > #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \

> >                                 ISERT_MAX_TX_MISC_PDUS  + \

> >                                 ISERT_MAX_RX_MISC_PDUS) Add the

> > maximum number of WQEs per command, The calculation of number of

> WQEs

> > per command needs to be something like

> > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".

> >

> 

> Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,

> right..?

Globally or perhaps configurable by sysfs configuration?
> 

> Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +

> ISER_MAX_CQ_LEN, which currently depend on

> ISERT_QP_MAX_REQ_DTOS..

I think it can remain dependent on MAX_REQ_DTOS. 
> 

> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime

> vs. exposing a smaller max_data_sg_nents=32 for ib_devices with limited

> attr.cap.max_send_sge..?

For our device defining max_data_sg_nents didn't help on some scenarios,
It seems that Frequency of the issue occurring increases with number of luns we
Try to run over. 

> 

> > For some devices like ours, breaking the IO into multiple WRs

> > according to supported number of SGEs doesn't necessarily means

> performance penalty.

> >

> 

> AFAICT ading max_data_sg_nents for iser-target is safe enough work-around

> to include for stable, assuming we agree on what the max_send_sg cut-off is

> for setting max_data_sg_nents=32 usage from a larger default.  I don't have

> a string preference either way, as long as it can be picked up for 4.x stable.

> 

> Sagi, WDYT..?
Potnuri Bharat Teja Jan. 24, 2018, 12:21 p.m. UTC | #8
On Wednesday, January 01/24/18, 2018 at 12:55:17 +0530, Nicholas A. Bellinger wrote:
> Hi Potnuri & Co,
> 
> On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote:
> > Hi Nicholas, 
> > thanks for the suggestions. Comments below.
> > 
> > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote:
> > > Hi Shiraz, Michal & Co,
> > > 
> > > Thanks for the feedback.  Comments below.
> > > 
> > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> > > > > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > > > 
> > > > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > > > 
> > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > > > > target queue-full memory leak:
> > > > > > 
> > > > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > > > 
> > > > > > Just curious how frequent this happens in practice with sustained large block
> > > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > > > > qedr and iw_cxgb4)..?
> > > > > > 
> > > > > > Is there anything else from an iser-target consumer level that should be
> > > > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > > > 
> > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > > > > Issue when running RoCE. It's not iWARP related. 
> > > > > This is easily reproduced within seconds with IO size of 5121K
> > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > > > > 
> > > > > IO Command used:
> > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > > > 
> > > > > thanks,
> > > > > Michal
> > > > 
> > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> > > > rather quickly, in a matter of seconds.
> > > > 
> > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> > > > 
> > > 
> > > A couple of thoughts.
> > > 
> > > First, would it be helpful to limit maximum payload size per I/O for
> > > consumers based on number of iser-target sq hw sges..?
> > yes, I think HW num sge needs to be propagated to iscsi target.
> > > 
> > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> > > maximum payload size per I/O being too large there is an existing
> > 
> > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
> > 256k and for READ tests only with chelsio(iw_cxgb4) adapters.
> 
> Thanks for confirming.
> 
> > > target_core_fabric_ops mechanism for limiting using SCSI residuals,
> > > originally utilized by qla2xxx here:
> > > 
> > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e
> > > 
> > > Note this patch also will return a smaller Block Limits VPD (0x86)
> > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
> > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
> > > automatically limit maximum outgoing payload transfer length, and avoid
> > > SCSI residual logic.
> > > 
> > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> > > iscsi-target, but you can try testing with a smaller value to see if
> > > it's useful.  Eg:
> > > 
> > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
> > > index 0ebc481..d8a4cc5 100644
> > > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
> > >         .module                         = THIS_MODULE,
> > >         .name                           = "iscsi",
> > >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
> > >         .get_fabric_name                = iscsi_get_fabric_name,
> > >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> > >         .tpg_get_tag                    = lio_tpg_get_tag,
> > > 
> > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16.
> 
> OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages)
> avoids SQ overflow with iw_cxgb4.
> 
> What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..?
max_send_sge is 4 for iw_cxgb4
 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise Jan. 24, 2018, 4:03 p.m. UTC | #9
Hey all,

> 
> Hi Potnuri & Co,
> 
> On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote:
> > Hi Nicholas,
> > thanks for the suggestions. Comments below.
> >
> > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger
> wrote:
> > > Hi Shiraz, Michal & Co,
> > >
> > > Thanks for the feedback.  Comments below.
> > >
> > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > > > > <Ariel.Elior@cavium.com>; target-devel <target-
> devel@vger.kernel.org>;
> > > > > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > > >
> > > > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > > >
> > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a
> iser-
> > > > > > target queue-full memory leak:
> > > > > >
> > > > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > > >
> > > > > > Just curious how frequent this happens in practice with sustained large
> block
> > > > > > workloads, as it appears to effect at least three different iwarp RNICS
> (i40iw,
> > > > > > qedr and iw_cxgb4)..?
> > > > > >
> > > > > > Is there anything else from an iser-target consumer level that should be
> > > > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > > >
> > > > > Would like to mention, that although we are an iWARP RNIC as well,
> we've hit this
> > > > > Issue when running RoCE. It's not iWARP related.
> > > > > This is easily reproduced within seconds with IO size of 5121K
> > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks
> each.
> > > > >
> > > > > IO Command used:
> > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > > >
> > > > > thanks,
> > > > > Michal
> > > >
> > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And
> similar to Michals report;
> > > > rather quickly, in a matter of seconds.
> > > >
> > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --
> size=20g --loops=1 --ioengine=libaio
> > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --
> filename=/dev/sdb --name=sdb
> > > >
> > >
> > > A couple of thoughts.
> > >
> > > First, would it be helpful to limit maximum payload size per I/O for
> > > consumers based on number of iser-target sq hw sges..?
> > yes, I think HW num sge needs to be propagated to iscsi target.
> > >
> > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> > > maximum payload size per I/O being too large there is an existing
> >
> > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
> > 256k and for READ tests only with chelsio(iw_cxgb4) adapters.
> 
> Thanks for confirming.
> 
> > > target_core_fabric_ops mechanism for limiting using SCSI residuals,
> > > originally utilized by qla2xxx here:
> > >
> > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8
> f9b565482c537821588444e09ff732c7d65ed6e
> > >
> > > Note this patch also will return a smaller Block Limits VPD (0x86)
> > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE,
> which
> > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH
> will
> > > automatically limit maximum outgoing payload transfer length, and avoid
> > > SCSI residual logic.
> > >
> > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> > > iscsi-target, but you can try testing with a smaller value to see if
> > > it's useful.  Eg:
> > >
> > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> b/drivers/target/iscsi/iscsi_target_configf
> > > index 0ebc481..d8a4cc5 100644
> > > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd
> *se_cmd)
> > >         .module                         = THIS_MODULE,
> > >         .name                           = "iscsi",
> > >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM
> TRANSFER LENGTH */
> > >         .get_fabric_name                = iscsi_get_fabric_name,
> > >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> > >         .tpg_get_tag                    = lio_tpg_get_tag,
> > >
> > With above change, SQ overflow isn't observed. I started of with
> max_data_sg_nents = 16.
> 
> OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages)
> avoids SQ overflow with iw_cxgb4.
> 
> What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..?

The ib_device attributes only advertise max_sge, which applies to both send and recv queues.  Because the iw_cxgb4 RQ max sge is 4, iw_cxgb4 currently advertises 4 for max_sge, even though the SQ can handle more.  So attr.cap.max_send_sge ends up being 4.

I have a todo on my list to extend the rdma/core to have max_send_sge and max_recv_sge attributes.  If it helps, I can bump up the priority of this.  Perhaps Bharat could do the work.

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Saleem, Shiraz Jan. 26, 2018, 6:52 p.m. UTC | #10
On Wed, Jan 24, 2018 at 01:01:58AM -0700, Nicholas A. Bellinger wrote:
> Hi Shiraz & Co,
> 
> Thanks for the feedback.
> 
> On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote:
> > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > 
> > > 
> > > First, would it be helpful to limit maximum payload size per I/O for consumers
> > > based on number of iser-target sq hw sges..?
> > > 
> > Assuming data is not able to be fast registered as if virtually contiguous;
> > artificially limiting the data size might not be the best solution.
> > 
> > But max SGEs does need to be exposed higher. Somewhere in the stack,
> > there might need to be multiple WRs submitted or data copied.
> > 
> 
> Sagi..?
> 

> 
> For reference, what is i40iw's max_send_sg reporting..?

3

> 
> Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER
> LENGTH to avoid consistent SQ overflow as-is with i40iw..?

For the configuration I am testing, max_data_sg_nents=32 & 64 worked,
and the SQ overflow issue reproduced at 128.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Jan. 29, 2018, 7:17 p.m. UTC | #11
>>> First, would it be helpful to limit maximum payload size per I/O for
>>> consumers based on number of iser-target sq hw sges..?
>>
>> I don't think you need to limit the maximum payload, but instead
>> initialize the max_wr to be based on the number of supported SGEs
>> Instead of what is there today:
>> #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
>>                                  ISERT_MAX_TX_MISC_PDUS  + \
>>                                  ISERT_MAX_RX_MISC_PDUS)
>> Add the maximum number of WQEs per command,
>> The calculation of number of WQEs per command needs to be something like
>> "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
>>
> 
> Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
> right..?
> 
> Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
> ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS..
> 
> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at
> runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with
> limited attr.cap.max_send_sge..?

Sorry for the late reply,

Can we go back and understand why do we need to limit isert transfer
size? I would suggest that we handle queue-full scenarios instead
of limiting the transfered payload size.

 From the trace Shiraz sent, it looks that:
a) we are too chatty when failing to post a wr on a queue-pair
(something that can happen by design), and
b) isert escalates to terminating the connection which means we
screwed up handling it.

Shiraz, can you explain these messages:
[17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3
[17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3

Who is initiating the connection teardown? the initiator or the target?
(looks like the initiator gave up on iscsi ping timeout expiration)

Nic,
Currently, what I see is that queue-full handler simply schedules qf
work to re-issue the I/O again. The issue is that the only way that
a new send queue entry becomes available again is that isert process
one or more send completions. If at all, this work is interfering with
the isert_send_done handler.

Will it be possible that some transports will be able to schedule
qf_work_queue themselves? I guess It would also hold if iscsit were to
use non-blocking sockets and continue at .write_space()?

Something like transport_process_wait_list() that would be triggered
from the transport completion handler (or from centralize place like
target_sess_put_cmd or something...)?

Also, I see that this wait list is singular accross the se_device. Maybe
it would be a better idea to have it per se_session as it maps to iscsi
connection (or srp channel for that matter)? For large I/O sizes this
should happen quite a lot so its a bit of a shame that we need will
to compete over the list_empty check...

If we prefer to make this go away by limiting the transfer size then its
fine I guess, but maybe we can do better? (although it can take some
extra work...)

>> For some devices like ours, breaking the IO into multiple WRs according to supported
>> number of SGEs doesn't necessarily means performance penalty.
>>
> 
> AFAICT ading max_data_sg_nents for iser-target is safe enough
> work-around to include for stable, assuming we agree on what the
> max_send_sg cut-off is for setting max_data_sg_nents=32 usage from a
> larger default.  I don't have a string preference either way, as long as
> it can be picked up for 4.x stable. >
> Sagi, WDYT..?

I think its an easier fix for sure. What I don't know, is weather this
introduces a regression for devices that can handle more sges on large
I/O sizes. I very much doubt it will though...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Jan. 29, 2018, 7:20 p.m. UTC | #12
>> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime
>> vs. exposing a smaller max_data_sg_nents=32 for ib_devices with limited
>> attr.cap.max_send_sge..?
> For our device defining max_data_sg_nents didn't help on some scenarios,
> It seems that Frequency of the issue occurring increases with number of luns we
> Try to run over.

Maybe this is related to the queue-full strategy the target core takes
by simply scheduling another attempt unconditionally without any
hard guarantees that the next attempt will succeed?

This flow is per se_device which might be a hint why its happening more
with a larger number of luns? maybe the isert completion handler
context (which is also workqueue) struggles with finding cpu quota?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Jan. 29, 2018, 7:36 p.m. UTC | #13
Hi,

>>> First, would it be helpful to limit maximum payload size per I/O for consumers
>>> based on number of iser-target sq hw sges..?
>>>
>> Assuming data is not able to be fast registered as if virtually contiguous;
>> artificially limiting the data size might not be the best solution.
>>
>> But max SGEs does need to be exposed higher. Somewhere in the stack,
>> there might need to be multiple WRs submitted or data copied.
>>
> 
> Sagi..?

I tend to agree that if the adapter support just a hand-full of sges its
counter-productive to expose infinite data transfer size. On the other
hand, I think we should be able to chunk more with memory registrations
(although rdma rw code never even allocates them for non-iwarp devices).

We have an API for check this in the RDMA core (thanks to chuck)
introduced in:
commit 0062818298662d0d05061949d12880146b5ebd65
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Mon Aug 28 15:06:14 2017 -0400

     rdma core: Add rdma_rw_mr_payload()

     The amount of payload per MR depends on device capabilities and
     the memory registration mode in use. The new rdma_rw API hides both,
     making it difficult for ULPs to determine how large their transport
     send queues need to be.

     Expose the MR payload information via a new API.

     Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
     Acked-by: Doug Ledford <dledford@redhat.com>
     Signed-off-by: J. Bruce Fields <bfields@redhat.com>


So the easy way out would be to use that and plug it to
max_sg_data_nents. Regardless, queue-full logic today yields a TX attack
on the transport.

>>> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
>>> b/drivers/target/iscsi/iscsi_target_configf
>>> index 0ebc481..d8a4cc5 100644
>>> --- a/drivers/target/iscsi/iscsi_target_configfs.c
>>> +++ b/drivers/target/iscsi/iscsi_target_configfs.c
>>> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
>>>          .module                         = THIS_MODULE,
>>>          .name                           = "iscsi",
>>>          .node_acl_size                  = sizeof(struct iscsi_node_acl),
>>> +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM
>>> TRANSFER LENGTH */
>>>          .get_fabric_name                = iscsi_get_fabric_name,
>>>          .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
>>>          .tpg_get_tag                    = lio_tpg_get_tag,
>>>
>>
>> BTW, this is helping the SQ overflow issue.
> 
> Thanks for confirming as a possible work-around.
> 
> For reference, what is i40iw's max_send_sg reporting..?
> 
> Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER
> LENGTH to avoid consistent SQ overflow as-is with i40iw..?

I vaguely recall that this is the maximum mr length for i40e (and cxgb4
if I'm not mistaken).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Saleem, Shiraz Jan. 30, 2018, 4:30 p.m. UTC | #14
On Mon, Jan 29, 2018 at 09:17:02PM +0200, Sagi Grimberg wrote:
> 
> > > > First, would it be helpful to limit maximum payload size per I/O for
> > > > consumers based on number of iser-target sq hw sges..?
> > > 
> > > I don't think you need to limit the maximum payload, but instead
> > > initialize the max_wr to be based on the number of supported SGEs
> > > Instead of what is there today:
> > > #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
> > >                                  ISERT_MAX_TX_MISC_PDUS  + \
> > >                                  ISERT_MAX_RX_MISC_PDUS)
> > > Add the maximum number of WQEs per command,
> > > The calculation of number of WQEs per command needs to be something like
> > > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
> > > 
> > 
> > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
> > right..?
> > 
> > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
> > ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS..
> > 
> > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at
> > runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with
> > limited attr.cap.max_send_sge..?
> 
> Sorry for the late reply,
> 
> Can we go back and understand why do we need to limit isert transfer
> size? I would suggest that we handle queue-full scenarios instead
> of limiting the transfered payload size.
> 
> From the trace Shiraz sent, it looks that:
> a) we are too chatty when failing to post a wr on a queue-pair
> (something that can happen by design), and
> b) isert escalates to terminating the connection which means we
> screwed up handling it.
> 
> Shiraz, can you explain these messages:
> [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3
> [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3

These are some device specific Asynchronous Event logging I turned on.
It indicates the QP received a FIN while in RTS and eventually was moved
to CLOSED state.

 
> Who is initiating the connection teardown? the initiator or the target?
> (looks like the initiator gave up on iscsi ping timeout expiration)
> 

Initiator

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
index 0ebc481..d8a4cc5 100644
--- a/drivers/target/iscsi/iscsi_target_configfs.c
+++ b/drivers/target/iscsi/iscsi_target_configfs.c
@@ -1553,6 +1553,7 @@  static void lio_release_cmd(struct se_cmd *se_cmd)
        .module                         = THIS_MODULE,
        .name                           = "iscsi",
        .node_acl_size                  = sizeof(struct iscsi_node_acl),
+       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
        .get_fabric_name                = iscsi_get_fabric_name,
        .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
        .tpg_get_tag                    = lio_tpg_get_tag,