diff mbox series

[PATCHv2,for-next,1/3] RDMA/rtrs-clt: Print more info when an error happens

Message ID 20210406123639.202899-2-gi-oh.kim@ionos.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series Improve debugging messages | expand

Commit Message

Gioh Kim April 6, 2021, 12:36 p.m. UTC
From: Gioh Kim <gi-oh.kim@cloud.ionos.com>

Client prints only error value and it is not enough for debugging.

1. When client receives an error from server:
the client does not only print the error value but also
more information of server connection.

2. When client failes to send IO:
the client gets an error from RDMA layer. It also
print more information of server connection.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

Comments

Leon Romanovsky April 6, 2021, 12:41 p.m. UTC | #1
On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> 
> Client prints only error value and it is not enough for debugging.
> 
> 1. When client receives an error from server:
> the client does not only print the error value but also
> more information of server connection.
> 
> 2. When client failes to send IO:
> the client gets an error from RDMA layer. It also
> print more information of server connection.
> 
> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> ---
>  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> index 5062328ac577..a534b2b09e13 100644
> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
>  	req->in_use = false;
>  	req->con = NULL;
>  
> +	if (unlikely(errno)) {

I'm sorry, but all your patches are full of these likely/unlikely cargo
cult. Can you please provide supportive performance data or delete all
likely/unlikely in all rtrs code?

Thanks
Gioh Kim April 9, 2021, 9:57 a.m. UTC | #2
On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> >
> > Client prints only error value and it is not enough for debugging.
> >
> > 1. When client receives an error from server:
> > the client does not only print the error value but also
> > more information of server connection.
> >
> > 2. When client failes to send IO:
> > the client gets an error from RDMA layer. It also
> > print more information of server connection.
> >
> > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > ---
> >  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> >  1 file changed, 29 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > index 5062328ac577..a534b2b09e13 100644
> > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> >       req->in_use = false;
> >       req->con = NULL;
> >
> > +     if (unlikely(errno)) {
>
> I'm sorry, but all your patches are full of these likely/unlikely cargo
> cult. Can you please provide supportive performance data or delete all
> likely/unlikely in all rtrs code?
>
> Thanks

Hi Leon,

Let me check my colleagues if there is any data about it.
I will inform you soon.
Jinpu Wang April 12, 2021, 12:22 p.m. UTC | #3
On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> >
> > Client prints only error value and it is not enough for debugging.
> >
> > 1. When client receives an error from server:
> > the client does not only print the error value but also
> > more information of server connection.
> >
> > 2. When client failes to send IO:
> > the client gets an error from RDMA layer. It also
> > print more information of server connection.
> >
> > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > ---
> >  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> >  1 file changed, 29 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > index 5062328ac577..a534b2b09e13 100644
> > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> >       req->in_use = false;
> >       req->con = NULL;
> >
> > +     if (unlikely(errno)) {
>
> I'm sorry, but all your patches are full of these likely/unlikely cargo
> cult. Can you please provide supportive performance data or delete all
> likely/unlikely in all rtrs code?

Hi Leon,

All the likely/unlikely from the non-fast path was removed as you
suggested in the past.
This one is on IO path, my understanding is for the fast path, with
likely/unlikely macro,
the compiler will optimize the code for better branch prediction.

We will run some benchmarks to see if it makes a difference.

Thanks
>
> Thanks
Leon Romanovsky April 12, 2021, 12:41 p.m. UTC | #4
On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> > >
> > > Client prints only error value and it is not enough for debugging.
> > >
> > > 1. When client receives an error from server:
> > > the client does not only print the error value but also
> > > more information of server connection.
> > >
> > > 2. When client failes to send IO:
> > > the client gets an error from RDMA layer. It also
> > > print more information of server connection.
> > >
> > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > > ---
> > >  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> > >  1 file changed, 29 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > index 5062328ac577..a534b2b09e13 100644
> > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> > >       req->in_use = false;
> > >       req->con = NULL;
> > >
> > > +     if (unlikely(errno)) {
> >
> > I'm sorry, but all your patches are full of these likely/unlikely cargo
> > cult. Can you please provide supportive performance data or delete all
> > likely/unlikely in all rtrs code?
> 
> Hi Leon,
> 
> All the likely/unlikely from the non-fast path was removed as you
> suggested in the past.
> This one is on IO path, my understanding is for the fast path, with
> likely/unlikely macro,
> the compiler will optimize the code for better branch prediction.

In theory yes, in practice. gcc 10 generated same assembly code when I
placed likely() and replaced it with unlikely() later.

> 
> We will run some benchmarks to see if it makes a difference.
> 
> Thanks
> >
> > Thanks
Jinpu Wang April 12, 2021, 12:53 p.m. UTC | #5
On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
> > On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> > > >
> > > > Client prints only error value and it is not enough for debugging.
> > > >
> > > > 1. When client receives an error from server:
> > > > the client does not only print the error value but also
> > > > more information of server connection.
> > > >
> > > > 2. When client failes to send IO:
> > > > the client gets an error from RDMA layer. It also
> > > > print more information of server connection.
> > > >
> > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > > > ---
> > > >  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> > > >  1 file changed, 29 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > index 5062328ac577..a534b2b09e13 100644
> > > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> > > >       req->in_use = false;
> > > >       req->con = NULL;
> > > >
> > > > +     if (unlikely(errno)) {
> > >
> > > I'm sorry, but all your patches are full of these likely/unlikely cargo
> > > cult. Can you please provide supportive performance data or delete all
> > > likely/unlikely in all rtrs code?
> >
> > Hi Leon,
> >
> > All the likely/unlikely from the non-fast path was removed as you
> > suggested in the past.
> > This one is on IO path, my understanding is for the fast path, with
> > likely/unlikely macro,
> > the compiler will optimize the code for better branch prediction.
>
> In theory yes, in practice. gcc 10 generated same assembly code when I
> placed likely() and replaced it with unlikely() later.
That's a surprise to me.

Just checked, min gcc requirement is 4.9[1],  debian Buster is using
gcc 8.3, upcoming Bullseye will use gcc 10.2

[1]: https://www.kernel.org/doc/html/latest/process/changes.html
>
> >
> > We will run some benchmarks to see if it makes a difference.
> >
> > Thanks
> > >
> > > Thanks
Gioh Kim April 12, 2021, 2 p.m. UTC | #6
On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote:
>
> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
> > > On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > > > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> > > > >
> > > > > Client prints only error value and it is not enough for debugging.
> > > > >
> > > > > 1. When client receives an error from server:
> > > > > the client does not only print the error value but also
> > > > > more information of server connection.
> > > > >
> > > > > 2. When client failes to send IO:
> > > > > the client gets an error from RDMA layer. It also
> > > > > print more information of server connection.
> > > > >
> > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > > > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > > > > ---
> > > > >  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> > > > >  1 file changed, 29 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > index 5062328ac577..a534b2b09e13 100644
> > > > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> > > > >       req->in_use = false;
> > > > >       req->con = NULL;
> > > > >
> > > > > +     if (unlikely(errno)) {
> > > >
> > > > I'm sorry, but all your patches are full of these likely/unlikely cargo
> > > > cult. Can you please provide supportive performance data or delete all
> > > > likely/unlikely in all rtrs code?
> > >
> > > Hi Leon,
> > >
> > > All the likely/unlikely from the non-fast path was removed as you
> > > suggested in the past.
> > > This one is on IO path, my understanding is for the fast path, with
> > > likely/unlikely macro,
> > > the compiler will optimize the code for better branch prediction.
> >
> > In theory yes, in practice. gcc 10 generated same assembly code when I
> > placed likely() and replaced it with unlikely() later.

Even-thought gcc 10 generated the same assembly code,
there is no guarantee for gcc 11 or gcc 12.

I am reviewing rtrs source file and have found some unnecessary likely/unlikely.
But I think likely/unlikely are necessary for extreme cases.
I will have a discussion with my colleagues and inform you of the result.
Leon Romanovsky April 12, 2021, 5:34 p.m. UTC | #7
On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote:
> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote:
> >
> > On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
> > > > On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > > > > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> > > > > >
> > > > > > Client prints only error value and it is not enough for debugging.
> > > > > >
> > > > > > 1. When client receives an error from server:
> > > > > > the client does not only print the error value but also
> > > > > > more information of server connection.
> > > > > >
> > > > > > 2. When client failes to send IO:
> > > > > > the client gets an error from RDMA layer. It also
> > > > > > print more information of server connection.
> > > > > >
> > > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > > > > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > > > > > ---
> > > > > >  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> > > > > >  1 file changed, 29 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > > index 5062328ac577..a534b2b09e13 100644
> > > > > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > > > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> > > > > >       req->in_use = false;
> > > > > >       req->con = NULL;
> > > > > >
> > > > > > +     if (unlikely(errno)) {
> > > > >
> > > > > I'm sorry, but all your patches are full of these likely/unlikely cargo
> > > > > cult. Can you please provide supportive performance data or delete all
> > > > > likely/unlikely in all rtrs code?
> > > >
> > > > Hi Leon,
> > > >
> > > > All the likely/unlikely from the non-fast path was removed as you
> > > > suggested in the past.
> > > > This one is on IO path, my understanding is for the fast path, with
> > > > likely/unlikely macro,
> > > > the compiler will optimize the code for better branch prediction.
> > >
> > > In theory yes, in practice. gcc 10 generated same assembly code when I
> > > placed likely() and replaced it with unlikely() later.
> 
> Even-thought gcc 10 generated the same assembly code,
> there is no guarantee for gcc 11 or gcc 12.
> 
> I am reviewing rtrs source file and have found some unnecessary likely/unlikely.
> But I think likely/unlikely are necessary for extreme cases.
> I will have a discussion with my colleagues and inform you of the result.

Please come with performance data.

Thanks
Haakon Bugge April 13, 2021, 5:31 a.m. UTC | #8
> On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote:
>> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote:
>>> 
>>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
>>>> 
>>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
>>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
>>>>>> 
>>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
>>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
>>>>>>> 
>>>>>>> Client prints only error value and it is not enough for debugging.
>>>>>>> 
>>>>>>> 1. When client receives an error from server:
>>>>>>> the client does not only print the error value but also
>>>>>>> more information of server connection.
>>>>>>> 
>>>>>>> 2. When client failes to send IO:
>>>>>>> the client gets an error from RDMA layer. It also
>>>>>>> print more information of server connection.
>>>>>>> 
>>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
>>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
>>>>>>> ---
>>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
>>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
>>>>>>> index 5062328ac577..a534b2b09e13 100644
>>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
>>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
>>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
>>>>>>>      req->in_use = false;
>>>>>>>      req->con = NULL;
>>>>>>> 
>>>>>>> +     if (unlikely(errno)) {
>>>>>> 
>>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo
>>>>>> cult. Can you please provide supportive performance data or delete all
>>>>>> likely/unlikely in all rtrs code?
>>>>> 
>>>>> Hi Leon,
>>>>> 
>>>>> All the likely/unlikely from the non-fast path was removed as you
>>>>> suggested in the past.
>>>>> This one is on IO path, my understanding is for the fast path, with
>>>>> likely/unlikely macro,
>>>>> the compiler will optimize the code for better branch prediction.
>>>> 
>>>> In theory yes, in practice. gcc 10 generated same assembly code when I
>>>> placed likely() and replaced it with unlikely() later.
>> 
>> Even-thought gcc 10 generated the same assembly code,
>> there is no guarantee for gcc 11 or gcc 12.
>> 
>> I am reviewing rtrs source file and have found some unnecessary likely/unlikely.
>> But I think likely/unlikely are necessary for extreme cases.
>> I will have a discussion with my colleagues and inform you of the result.
> 
> Please come with performance data.

I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any.


Thxs, Håkon
Leon Romanovsky April 13, 2021, 6:43 a.m. UTC | #9
On Tue, Apr 13, 2021 at 05:31:24AM +0000, Haakon Bugge wrote:
> 
> 
> > On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote:
> > 
> > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote:
> >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote:
> >>> 
> >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> >>>> 
> >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
> >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> >>>>>> 
> >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> >>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> >>>>>>> 
> >>>>>>> Client prints only error value and it is not enough for debugging.
> >>>>>>> 
> >>>>>>> 1. When client receives an error from server:
> >>>>>>> the client does not only print the error value but also
> >>>>>>> more information of server connection.
> >>>>>>> 
> >>>>>>> 2. When client failes to send IO:
> >>>>>>> the client gets an error from RDMA layer. It also
> >>>>>>> print more information of server connection.
> >>>>>>> 
> >>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> >>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> >>>>>>> ---
> >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-)
> >>>>>>> 
> >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> >>>>>>> index 5062328ac577..a534b2b09e13 100644
> >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> >>>>>>>      req->in_use = false;
> >>>>>>>      req->con = NULL;
> >>>>>>> 
> >>>>>>> +     if (unlikely(errno)) {
> >>>>>> 
> >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo
> >>>>>> cult. Can you please provide supportive performance data or delete all
> >>>>>> likely/unlikely in all rtrs code?
> >>>>> 
> >>>>> Hi Leon,
> >>>>> 
> >>>>> All the likely/unlikely from the non-fast path was removed as you
> >>>>> suggested in the past.
> >>>>> This one is on IO path, my understanding is for the fast path, with
> >>>>> likely/unlikely macro,
> >>>>> the compiler will optimize the code for better branch prediction.
> >>>> 
> >>>> In theory yes, in practice. gcc 10 generated same assembly code when I
> >>>> placed likely() and replaced it with unlikely() later.
> >> 
> >> Even-thought gcc 10 generated the same assembly code,
> >> there is no guarantee for gcc 11 or gcc 12.
> >> 
> >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely.
> >> But I think likely/unlikely are necessary for extreme cases.
> >> I will have a discussion with my colleagues and inform you of the result.
> > 
> > Please come with performance data.
> 
> I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any.

In theory, it will multiply by 2 gain/loss, which is nice to see if
likely/ulikely change something.

Thanks

> 
> 
> Thxs, Håkon
>
Gioh Kim April 13, 2021, 1:11 p.m. UTC | #10
On Tue, Apr 13, 2021 at 8:43 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Apr 13, 2021 at 05:31:24AM +0000, Haakon Bugge wrote:
> >
> >
> > > On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote:
> > >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote:
> > >>>
> > >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >>>>
> > >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
> > >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >>>>>>
> > >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > >>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> > >>>>>>>
> > >>>>>>> Client prints only error value and it is not enough for debugging.
> > >>>>>>>
> > >>>>>>> 1. When client receives an error from server:
> > >>>>>>> the client does not only print the error value but also
> > >>>>>>> more information of server connection.
> > >>>>>>>
> > >>>>>>> 2. When client failes to send IO:
> > >>>>>>> the client gets an error from RDMA layer. It also
> > >>>>>>> print more information of server connection.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > >>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > >>>>>>> ---
> > >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> > >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-)
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > >>>>>>> index 5062328ac577..a534b2b09e13 100644
> > >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> > >>>>>>>      req->in_use = false;
> > >>>>>>>      req->con = NULL;
> > >>>>>>>
> > >>>>>>> +     if (unlikely(errno)) {
> > >>>>>>
> > >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo
> > >>>>>> cult. Can you please provide supportive performance data or delete all
> > >>>>>> likely/unlikely in all rtrs code?
> > >>>>>
> > >>>>> Hi Leon,
> > >>>>>
> > >>>>> All the likely/unlikely from the non-fast path was removed as you
> > >>>>> suggested in the past.
> > >>>>> This one is on IO path, my understanding is for the fast path, with
> > >>>>> likely/unlikely macro,
> > >>>>> the compiler will optimize the code for better branch prediction.
> > >>>>
> > >>>> In theory yes, in practice. gcc 10 generated same assembly code when I
> > >>>> placed likely() and replaced it with unlikely() later.
> > >>
> > >> Even-thought gcc 10 generated the same assembly code,
> > >> there is no guarantee for gcc 11 or gcc 12.
> > >>
> > >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely.
> > >> But I think likely/unlikely are necessary for extreme cases.
> > >> I will have a discussion with my colleagues and inform you of the result.
> > >
> > > Please come with performance data.
> >
> > I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any.
>
> In theory, it will multiply by 2 gain/loss, which is nice to see if
> likely/ulikely change something.
>
> Thanks
>
> >
> >
> > Thxs, Håkon
> >

Hi,

In summary, there is no performance gap before/after swapping
likely/unlikely macros.
So I will send a patch to remove all likely/unlikely macros.

I guess that is because
- The performance of rnbd/rtrs depends on the network and block layer.
- The network and block layer are not fast enough to get impacted by
likely/unlikely.

I ran fio read test with 32 rnbd devices and 64/128 processes on 64-CORE server.
The fio generated the exact same result before and after the swapping.
Thanks to Håkon for the test idea.

Test environment:
- Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- 376G memory
- kernel version: 5.4.86
- gcc version: gcc (Debian 8.3.0-6) 8.3.0
- Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]

Test result:
- before swapping:
32-dev/64-proc: IOPS=829k, BW=3239MiB/s
32-dev/128-proc: IOPS=816k, BW=3187MiB/s
- after swapping
 32-dev/64-proc: IOPS=829k, BW=3238MiB/s
32-dev/128-proc: IOPS=817k, BW=3191MiB/s
(128-proc is worse than 64-proc but that is another issue)

Attached files:
- 0001-swap-likely-and-unlikely.patch: a patch file swapping likely
and unlikely to show how I tested
- after_swap.txt: raw data after swapping
- current.txt: raw data before swapping

For your information, I ran the performance test on two 8-core desktop machines
that are directly linked by Infiniband cables without switch.
I got the same result with them: no performance difference.
141 root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# git reset --hard HEAD~2
HEAD is now at 99c7c2f RDMA/rtrs-clt: destroy sysfs after removing session from active list
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# make clean
make[1]: Entering directory '/usr/src/linux-5.4.86-pserver'
make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver'
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# make
make[1]: Entering directory '/usr/src/linux-5.4.86-pserver'
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt-sysfs.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-common.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-dev.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-sysfs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-stats.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-sysfs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-stats.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-sysfs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.o
  AR      /tmp/ddd/gkim/ibnbd2/built-in.a
  Building modules, stage 2.
  MODPOST 5 modules
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.ko
make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver'
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo rmmod rnbd-client
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo rmmod rtrs-client
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo rmmod rtrs-core
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo insmod rtrs/rtrs-core.ko
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo insmod rtrs/rtrs-client.ko
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo insmod rnbd/rnbd-client.ko




root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev.sh
fio start   : Di 13. Apr 10:38:09 UTC 2021
kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux
fio version : fio-3.12
gcc: gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Start fio test
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.12
Starting 64 processes
Jobs: 64 (f=2048): [r(64)][0.1%][r=3233MiB/s][r=828k IOPS][eta 01d:21h:06m:50s]
fiotest: (groupid=0, jobs=64): err= 0: pid=23219: Tue Apr 13 10:41:11 2021
  read: IOPS=829k, BW=3239MiB/s (3396MB/s)(569GiB/180011msec)
    slat (usec): min=82, max=156993, avg=1278.01, stdev=1683.01
    clat (nsec): min=1096, max=30634k, avg=8538323.42, stdev=2411851.29
     lat (usec): min=317, max=162011, avg=9816.39, stdev=2702.04
    clat percentiles (usec):
     |  1.00th=[ 3949],  5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6587],
     | 30.00th=[ 7111], 40.00th=[ 7701], 50.00th=[ 8225], 60.00th=[ 8848],
     | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11731], 95.00th=[12911],
     | 99.00th=[15270], 99.50th=[16319], 99.90th=[18482], 99.95th=[19530],
     | 99.99th=[21890]
   bw (  KiB/s): min=36864, max=166912, per=1.56%, avg=51773.23, stdev=3159.85, samples=22980
   iops        : min= 9216, max=41728, avg=12943.26, stdev=789.95, samples=22980
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.01%, 250=0.04%, 500=0.04%, 750=0.02%, 1000=0.02%
  lat (msec)   : 2=0.06%, 4=0.90%, 10=74.65%, 20=24.23%, 50=0.03%
  cpu          : usr=0.97%, sys=4.95%, ctx=82650519, majf=0, minf=3555539
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=100.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=100.0%
     issued rwts: total=149252805,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3239MiB/s (3396MB/s), 3239MiB/s-3239MiB/s (3396MB/s-3396MB/s), io=569GiB (611GB), run=180011-180011msec

Disk stats (read/write):
  rnbd0: ios=4663183/0, merge=0/0, ticks=10123475/0, in_queue=314090, util=99.62%
  rnbd1: ios=4663180/0, merge=0/0, ticks=10445268/0, in_queue=334450, util=99.65%
  rnbd2: ios=4663181/0, merge=0/0, ticks=10631138/0, in_queue=346090, util=99.65%
  rnbd3: ios=4663179/0, merge=0/0, ticks=10791046/0, in_queue=350590, util=99.67%
  rnbd4: ios=4663175/0, merge=0/0, ticks=10888045/0, in_queue=361960, util=99.68%
  rnbd5: ios=4663183/0, merge=0/0, ticks=10994697/0, in_queue=369540, util=99.70%
  rnbd6: ios=4663182/0, merge=0/0, ticks=11010726/0, in_queue=365730, util=99.71%
  rnbd7: ios=4663182/0, merge=0/0, ticks=11041009/0, in_queue=373870, util=99.71%
  rnbd8: ios=4663185/0, merge=0/0, ticks=11050961/0, in_queue=375140, util=99.74%
  rnbd9: ios=4663185/0, merge=0/0, ticks=11063691/0, in_queue=373870, util=99.75%
  rnbd10: ios=4663184/0, merge=0/0, ticks=11099119/0, in_queue=375340, util=99.76%
  rnbd11: ios=4663187/0, merge=0/0, ticks=11112331/0, in_queue=382300, util=99.78%
  rnbd12: ios=4663187/0, merge=1/0, ticks=11078851/0, in_queue=376050, util=99.79%
  rnbd13: ios=4663188/0, merge=0/0, ticks=11087422/0, in_queue=376040, util=99.80%
  rnbd14: ios=4663190/0, merge=0/0, ticks=11070282/0, in_queue=378500, util=99.80%
  rnbd15: ios=4663190/0, merge=0/0, ticks=9417418/0, in_queue=271060, util=99.81%
  rnbd16: ios=4663191/0, merge=0/0, ticks=10588441/0, in_queue=348800, util=99.84%
  rnbd17: ios=4663193/0, merge=0/0, ticks=10740263/0, in_queue=364650, util=99.84%
  rnbd18: ios=4663195/0, merge=0/0, ticks=10813752/0, in_queue=371990, util=99.86%
  rnbd19: ios=4663193/0, merge=0/0, ticks=10878352/0, in_queue=375050, util=99.87%
  rnbd20: ios=4663193/0, merge=0/0, ticks=10845686/0, in_queue=371010, util=99.88%
  rnbd21: ios=4663195/0, merge=0/0, ticks=10854889/0, in_queue=373940, util=99.90%
  rnbd22: ios=4663197/0, merge=0/0, ticks=10936251/0, in_queue=378890, util=99.90%
  rnbd23: ios=4663195/0, merge=0/0, ticks=11000989/0, in_queue=380360, util=99.92%
  rnbd24: ios=4663200/0, merge=0/0, ticks=11056302/0, in_queue=389300, util=99.93%
  rnbd25: ios=4663199/0, merge=0/0, ticks=11099625/0, in_queue=396820, util=99.93%
  rnbd26: ios=4663197/0, merge=0/0, ticks=11091101/0, in_queue=391310, util=99.95%
  rnbd27: ios=4663201/0, merge=0/0, ticks=11108242/0, in_queue=396440, util=99.96%
  rnbd28: ios=4663203/0, merge=0/0, ticks=11222083/0, in_queue=405730, util=100.00%
  rnbd29: ios=4663205/0, merge=0/0, ticks=11251353/0, in_queue=412810, util=99.99%
  rnbd30: ios=4663201/0, merge=0/0, ticks=11238249/0, in_queue=413260, util=100.00%
  rnbd31: ios=4663218/0, merge=0/0, ticks=11267469/0, in_queue=414620, util=100.00%




root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev.sh
fio start   : Di 13. Apr 10:58:33 UTC 2021
kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux
fio version : fio-3.12
gcc: gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Start fio test
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.12
Starting 64 processes
Jobs: 64 (f=2048): [r(64)][100.0%][r=3227MiB/s][r=826k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=64): err= 0: pid=25682: Tue Apr 13 11:01:35 2021
  read: IOPS=826k, BW=3228MiB/s (3385MB/s)(568GiB/180012msec)
    slat (usec): min=170, max=131883, avg=1286.52, stdev=1678.50
    clat (nsec): min=1357, max=31141k, avg=8561491.28, stdev=2417740.18
     lat (usec): min=502, max=133262, avg=9848.07, stdev=2710.62
    clat percentiles (usec):
     |  1.00th=[ 3949],  5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6587],
     | 30.00th=[ 7177], 40.00th=[ 7701], 50.00th=[ 8225], 60.00th=[ 8848],
     | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11863], 95.00th=[13042],
     | 99.00th=[15401], 99.50th=[16319], 99.90th=[18482], 99.95th=[19268],
     | 99.99th=[21890]
   bw (  KiB/s): min=32768, max=258048, per=1.56%, avg=51622.65, stdev=3500.81, samples=22982
   iops        : min= 8192, max=64512, avg=12905.63, stdev=875.20, samples=22982
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.02%, 250=0.03%, 500=0.02%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.07%, 4=0.89%, 10=74.32%, 20=24.59%, 50=0.03%
  cpu          : usr=0.96%, sys=5.02%, ctx=81774475, majf=0, minf=3197872
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     issued rwts: total=148772864,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3228MiB/s (3385MB/s), 3228MiB/s-3228MiB/s (3385MB/s-3385MB/s), io=568GiB (609GB), run=180012-180012msec

Disk stats (read/write):
  rnbd0: ios=4647869/0, merge=0/0, ticks=10157474/0, in_queue=328770, util=99.64%
  rnbd1: ios=4647859/0, merge=0/0, ticks=10417978/0, in_queue=340850, util=99.67%
  rnbd2: ios=4647863/0, merge=0/0, ticks=10623846/0, in_queue=357690, util=99.68%
  rnbd3: ios=4647875/0, merge=0/0, ticks=10762992/0, in_queue=364280, util=99.70%
  rnbd4: ios=4647869/0, merge=0/0, ticks=10870408/0, in_queue=371350, util=99.70%
  rnbd5: ios=4647866/0, merge=0/0, ticks=10955394/0, in_queue=377050, util=99.72%
  rnbd6: ios=4647875/0, merge=0/0, ticks=11010235/0, in_queue=383640, util=99.73%
  rnbd7: ios=4647873/0, merge=0/0, ticks=11032744/0, in_queue=385440, util=99.73%
  rnbd8: ios=4647871/0, merge=0/0, ticks=11043787/0, in_queue=384150, util=99.76%
  rnbd9: ios=4647877/0, merge=0/0, ticks=9415047/0, in_queue=281570, util=99.77%
  rnbd10: ios=4647881/0, merge=0/0, ticks=10427629/0, in_queue=348200, util=99.78%
  rnbd11: ios=4647876/0, merge=0/0, ticks=10615844/0, in_queue=362920, util=99.81%
  rnbd12: ios=4647872/0, merge=1/0, ticks=10638181/0, in_queue=366520, util=99.81%
  rnbd13: ios=4647882/0, merge=0/0, ticks=10678828/0, in_queue=368450, util=99.82%
  rnbd14: ios=4647874/0, merge=0/0, ticks=10678088/0, in_queue=370470, util=99.83%
  rnbd15: ios=4647887/0, merge=0/0, ticks=10702071/0, in_queue=369640, util=99.84%
  rnbd16: ios=4647875/0, merge=0/0, ticks=10710059/0, in_queue=374900, util=99.86%
  rnbd17: ios=4647883/0, merge=0/0, ticks=10761335/0, in_queue=378860, util=99.87%
  rnbd18: ios=4647884/0, merge=0/0, ticks=10784441/0, in_queue=379010, util=99.89%
  rnbd19: ios=4647882/0, merge=0/0, ticks=10828395/0, in_queue=380680, util=99.92%
  rnbd20: ios=4647897/0, merge=0/0, ticks=10856351/0, in_queue=384050, util=99.91%
  rnbd21: ios=4647898/0, merge=0/0, ticks=10889226/0, in_queue=385120, util=99.94%
  rnbd22: ios=4647890/0, merge=0/0, ticks=10915544/0, in_queue=389180, util=99.93%
  rnbd23: ios=4647888/0, merge=0/0, ticks=10922797/0, in_queue=392600, util=99.94%
  rnbd24: ios=4647891/0, merge=0/0, ticks=10964743/0, in_queue=395200, util=99.95%
  rnbd25: ios=4647894/0, merge=0/0, ticks=11017459/0, in_queue=408070, util=99.96%
  rnbd26: ios=4647896/0, merge=0/0, ticks=11084377/0, in_queue=405700, util=99.98%
  rnbd27: ios=4647893/0, merge=0/0, ticks=11108133/0, in_queue=407300, util=99.99%
  rnbd28: ios=4647905/0, merge=0/0, ticks=11153595/0, in_queue=416610, util=100.00%
  rnbd29: ios=4647899/0, merge=0/0, ticks=11213811/0, in_queue=420820, util=100.00%
  rnbd30: ios=4647903/0, merge=0/0, ticks=11228964/0, in_queue=421960, util=100.00%
  rnbd31: ios=4647904/0, merge=0/0, ticks=11277612/0, in_queue=424840, util=100.00%



root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev.sh
fio start   : Di 13. Apr 11:04:17 UTC 2021
kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux
fio version : fio-3.12
gcc: gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Start fio test
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.12
Starting 64 processes
Jobs: 64 (f=2048): [r(64)][100.0%][r=3239MiB/s][r=829k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=64): err= 0: pid=26523: Tue Apr 13 11:07:19 2021
  read: IOPS=829k, BW=3238MiB/s (3395MB/s)(569GiB/180019msec)
    slat (usec): min=168, max=172647, avg=1290.89, stdev=1702.42
    clat (nsec): min=1359, max=34740k, avg=8525517.65, stdev=2410440.28
     lat (usec): min=454, max=172848, avg=9816.46, stdev=2718.20
    clat percentiles (usec):
     |  1.00th=[ 3916],  5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6521],
     | 30.00th=[ 7111], 40.00th=[ 7635], 50.00th=[ 8225], 60.00th=[ 8848],
     | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11731], 95.00th=[12911],
     | 99.00th=[15270], 99.50th=[16188], 99.90th=[18482], 99.95th=[19530],
     | 99.99th=[22152]
   bw (  KiB/s): min=31744, max=282624, per=1.56%, avg=51770.75, stdev=3922.53, samples=22985
   iops        : min= 7936, max=70656, avg=12942.66, stdev=980.63, samples=22985
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.03%
  lat (usec)   : 100=0.02%, 250=0.04%, 500=0.02%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.06%, 4=0.91%, 10=74.79%, 20=24.07%, 50=0.04%
  cpu          : usr=1.00%, sys=4.98%, ctx=82332948, majf=0, minf=3752201
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     issued rwts: total=149203144,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3238MiB/s (3395MB/s), 3238MiB/s-3238MiB/s (3395MB/s-3395MB/s), io=569GiB (611GB), run=180019-180019msec

Disk stats (read/write):
  rnbd0: ios=4661066/0, merge=0/0, ticks=10264711/0, in_queue=322110, util=99.62%
  rnbd1: ios=4661063/0, merge=0/0, ticks=10550895/0, in_queue=343350, util=99.64%
  rnbd2: ios=4661075/0, merge=0/0, ticks=10724899/0, in_queue=347820, util=99.65%
  rnbd3: ios=4661082/0, merge=0/0, ticks=10733532/0, in_queue=349620, util=99.67%
  rnbd4: ios=4661077/0, merge=0/0, ticks=10863981/0, in_queue=357250, util=99.67%
  rnbd5: ios=4661072/0, merge=0/0, ticks=10989829/0, in_queue=368510, util=99.68%
  rnbd6: ios=4661085/0, merge=0/0, ticks=11013824/0, in_queue=365960, util=99.68%
  rnbd7: ios=4661082/0, merge=0/0, ticks=11069249/0, in_queue=372950, util=99.70%
  rnbd8: ios=4661074/0, merge=0/0, ticks=9409880/0, in_queue=262130, util=99.72%
  rnbd9: ios=4661075/0, merge=0/0, ticks=10481517/0, in_queue=338960, util=99.73%
  rnbd10: ios=4661071/0, merge=0/0, ticks=10676791/0, in_queue=352930, util=99.74%
  rnbd11: ios=4661080/0, merge=0/0, ticks=10703680/0, in_queue=352040, util=99.76%
  rnbd12: ios=4661073/0, merge=0/0, ticks=10694124/0, in_queue=354740, util=99.76%
  rnbd13: ios=4661074/0, merge=0/0, ticks=10705697/0, in_queue=353340, util=99.78%
  rnbd14: ios=4661082/0, merge=0/0, ticks=10748646/0, in_queue=361020, util=99.78%
  rnbd15: ios=4661089/0, merge=0/0, ticks=10754424/0, in_queue=362580, util=99.79%
  rnbd16: ios=4661085/0, merge=0/0, ticks=10781956/0, in_queue=362570, util=99.82%
  rnbd17: ios=4661083/0, merge=0/0, ticks=10842119/0, in_queue=369510, util=99.83%
  rnbd18: ios=4661084/0, merge=0/0, ticks=10835750/0, in_queue=370490, util=99.85%
  rnbd19: ios=4661093/0, merge=0/0, ticks=10903655/0, in_queue=373100, util=99.86%
  rnbd20: ios=4661092/0, merge=0/0, ticks=10917943/0, in_queue=376360, util=99.87%
  rnbd21: ios=4661094/0, merge=0/0, ticks=10951428/0, in_queue=380590, util=99.89%
  rnbd22: ios=4661088/0, merge=0/0, ticks=10969831/0, in_queue=379130, util=99.89%
  rnbd23: ios=4661087/0, merge=0/0, ticks=11036709/0, in_queue=387860, util=99.91%
  rnbd24: ios=4661095/0, merge=0/0, ticks=11045243/0, in_queue=389770, util=99.91%
  rnbd25: ios=4661094/0, merge=0/0, ticks=11096795/0, in_queue=391280, util=99.93%
  rnbd26: ios=4661089/0, merge=0/0, ticks=11168428/0, in_queue=401420, util=99.95%
  rnbd27: ios=4661088/0, merge=0/0, ticks=11199394/0, in_queue=406780, util=99.96%
  rnbd28: ios=4661100/0, merge=0/0, ticks=11211816/0, in_queue=401820, util=99.98%
  rnbd29: ios=4661107/0, merge=0/0, ticks=11260826/0, in_queue=410480, util=99.99%
  rnbd30: ios=4661111/0, merge=0/0, ticks=11301041/0, in_queue=418590, util=99.99%
  rnbd31: ios=4661105/0, merge=0/0, ticks=11264838/0, in_queue=414460, util=100.00%

root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev_128proc.sh
fio start   : Di 13. Apr 11:36:53 UTC 2021
kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux
fio version : fio-3.12
gcc: gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Start fio test
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.12
Starting 128 processes
Jobs: 62 (f=0): [f(1),_(2),f(1),_(1),f(1),_(3),f(1),_(2),f(3),_(2),f(1),_(3),f(1),_(3),f(2),_(1),f(1),_(8),f(4),_(1),f(1),_(2),f(1),_(1),f(1),_(1),f(1),_(2),f(2),_(1),f(1),_(2),f(3),_(1),f(1),_(1),f(1),_(1),
f(2),_(1),f(2),_(1),f(4),_(1),f(1),_(5),f(1),_(1),f(1),_(2),f(2),_(6),f(6),_(1),f(3),_(1),f(1),_(4),f(5),_(1),f(3),_(3),f(1),_(1),f(2)][100.0%][r=2504MiB/s][r=641k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=128): err= 0: pid=32673: Tue Apr 13 11:39:56 2021
  read: IOPS=816k, BW=3187MiB/s (3341MB/s)(560GiB/180030msec)
    slat (usec): min=194, max=481083, avg=7768.18, stdev=5693.43
    clat (nsec): min=1163, max=40685k, avg=12208741.78, stdev=3544380.51
     lat (usec): min=509, max=483253, avg=19976.98, stdev=5877.39
    clat percentiles (usec):
     |  1.00th=[ 4555],  5.00th=[ 6849], 10.00th=[ 7963], 20.00th=[ 9372],
     | 30.00th=[10290], 40.00th=[11207], 50.00th=[11994], 60.00th=[12911],
     | 70.00th=[13829], 80.00th=[15008], 90.00th=[16712], 95.00th=[18220],
     | 99.00th=[21627], 99.50th=[23200], 99.90th=[26346], 99.95th=[27657],
     | 99.99th=[30540]
   bw (  KiB/s): min= 2048, max=173732, per=0.78%, avg=25459.52, stdev=2917.03, samples=45970
   iops        : min=  512, max=43433, avg=6364.85, stdev=729.26, samples=45970
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.03%
  lat (usec)   : 100=0.04%, 250=0.05%, 500=0.03%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.06%, 4=0.44%, 10=25.99%, 20=71.05%, 50=2.28%
  cpu          : usr=0.58%, sys=2.18%, ctx=75776088, majf=0, minf=6628006
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     issued rwts: total=146863336,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3187MiB/s (3341MB/s), 3187MiB/s-3187MiB/s (3341MB/s-3341MB/s), io=560GiB (602GB), run=180030-180030msec

Disk stats (read/write):
  rnbd0: ios=4589419/0, merge=0/0, ticks=14454693/0, in_queue=2184000, util=99.67%
  rnbd1: ios=4589419/0, merge=0/0, ticks=14939602/0, in_queue=2323370, util=99.68%
  rnbd2: ios=4589422/0, merge=0/0, ticks=15071745/0, in_queue=2363720, util=99.69%
  rnbd3: ios=4589422/0, merge=0/0, ticks=15196019/0, in_queue=2395360, util=99.70%
  rnbd4: ios=4589422/0, merge=0/0, ticks=15229235/0, in_queue=2401140, util=99.71%
  rnbd5: ios=4589423/0, merge=0/0, ticks=15265557/0, in_queue=2395570, util=99.72%
  rnbd6: ios=4589425/0, merge=0/0, ticks=15305021/0, in_queue=2405310, util=99.72%
  rnbd7: ios=4589424/0, merge=0/0, ticks=15361167/0, in_queue=2417510, util=99.73%
  rnbd8: ios=4589425/0, merge=0/0, ticks=15374308/0, in_queue=2419060, util=99.76%
  rnbd9: ios=4589427/0, merge=0/0, ticks=11516180/0, in_queue=1438280, util=99.77%
  rnbd10: ios=4589427/0, merge=0/0, ticks=14170395/0, in_queue=2147290, util=99.77%
  rnbd11: ios=4589427/0, merge=0/0, ticks=14720156/0, in_queue=2296450, util=99.79%
  rnbd12: ios=4589426/0, merge=0/0, ticks=14798828/0, in_queue=2319630, util=99.79%
  rnbd13: ios=4589426/0, merge=0/0, ticks=14779692/0, in_queue=2314110, util=99.81%
  rnbd14: ios=4589426/0, merge=0/0, ticks=14807971/0, in_queue=2320940, util=99.81%
  rnbd15: ios=4589427/0, merge=0/0, ticks=14815817/0, in_queue=2320800, util=99.82%
  rnbd16: ios=4589428/0, merge=0/0, ticks=14874511/0, in_queue=2341850, util=99.85%
  rnbd17: ios=4589428/0, merge=0/0, ticks=14884319/0, in_queue=2340410, util=99.86%
  rnbd18: ios=4589429/0, merge=0/0, ticks=14889740/0, in_queue=2350840, util=99.88%
  rnbd19: ios=4589428/0, merge=0/0, ticks=14851478/0, in_queue=2333840, util=99.89%
  rnbd20: ios=4589430/0, merge=0/0, ticks=14917096/0, in_queue=2350030, util=99.89%
  rnbd21: ios=4589430/0, merge=0/0, ticks=14896627/0, in_queue=2342230, util=99.91%
  rnbd22: ios=4589431/0, merge=0/0, ticks=14865768/0, in_queue=2323510, util=99.91%
  rnbd23: ios=4589431/0, merge=0/0, ticks=14943766/0, in_queue=2351210, util=99.92%
  rnbd24: ios=4589431/0, merge=0/0, ticks=14952482/0, in_queue=2368310, util=99.92%
  rnbd25: ios=4589431/0, merge=0/0, ticks=14966379/0, in_queue=2367030, util=99.93%
  rnbd26: ios=4589433/0, merge=0/0, ticks=14975019/0, in_queue=2368030, util=99.95%
  rnbd27: ios=4589434/0, merge=0/0, ticks=14990885/0, in_queue=2369840, util=99.96%
  rnbd28: ios=4589433/0, merge=0/0, ticks=14936498/0, in_queue=2366290, util=99.98%
  rnbd29: ios=4589433/0, merge=0/0, ticks=14986887/0, in_queue=2380530, util=99.99%
  rnbd30: ios=4589434/0, merge=1/0, ticks=15018143/0, in_queue=2395330, util=99.99%
  rnbd31: ios=4589435/0, merge=0/0, ticks=14995177/0, in_queue=2381090, util=100.00%
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# git show HEAD | head
commit 2636311e5e2894bd7c7800939a3b9b68e7a93bcc
Author: Gioh Kim <gi-oh.kim@ionos.com>
Date:   Tue Apr 13 14:00:27 2021 +0200

    swap likely and unlikely

diff --git a/rtrs/rtrs-clt.c b/rtrs/rtrs-clt.c
index 1b4b3e6..6235827 100644
--- a/rtrs/rtrs-clt.c
+++ b/rtrs/rtrs-clt.c


141 root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# make clean && make
make[1]: Entering directory '/usr/src/linux-5.4.86-pserver'
  CLEAN   /tmp/ddd/gkim/ibnbd2/Module.symvers
make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver'
make[1]: Entering directory '/usr/src/linux-5.4.86-pserver'
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt-sysfs.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-common.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-dev.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-sysfs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-stats.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-sysfs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-stats.o
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-sysfs.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.o
  AR      /tmp/ddd/gkim/ibnbd2/built-in.a 
  Building modules, stage 2.
  MODPOST 5 modules
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.ko
  CC [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.mod.o
  LD [M]  /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.ko
make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver'
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# rmmod rnbd-client
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# rmmod rtrs-client
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# rmmod rtrs-core
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# insmod rtrs/rtrs-core.ko
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# insmod rtrs/rtrs-client.ko
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# insmod rnbd/rnbd-client.ko


fio start   : Di 13. Apr 12:10:30 UTC 2021
kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux
fio version : fio-3.12
gcc: gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Start fio test
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.12
Starting 64 processes
Jobs: 64 (f=1814): [r(4),f(2),r(5),f(1),r(11),f(1),r(2),f(1),r(4),f(2),r(5),f(1),r(4),f(1),r(12),f(1),r(5),f(1),r(1)][100.0%][r=3244MiB/s][r=830k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=64): err= 0: pid=37528: Tue Apr 13 12:13:32 2021
  read: IOPS=829k, BW=3238MiB/s (3395MB/s)(569GiB/180025msec)
    slat (usec): min=165, max=195365, avg=1271.82, stdev=1671.22
    clat (nsec): min=1080, max=32693k, avg=8544062.13, stdev=2411682.41
     lat (usec): min=407, max=206880, avg=9815.94, stdev=2694.31
    clat percentiles (usec):
     |  1.00th=[ 3949],  5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6587],
     | 30.00th=[ 7177], 40.00th=[ 7701], 50.00th=[ 8225], 60.00th=[ 8848],
     | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11731], 95.00th=[12911],
     | 99.00th=[15270], 99.50th=[16319], 99.90th=[18482], 99.95th=[19530],
     | 99.99th=[22152]
   bw (  KiB/s): min=29696, max=254980, per=1.56%, avg=51775.19, stdev=3418.36, samples=22980
   iops        : min= 7424, max=63745, avg=12943.76, stdev=854.59, samples=22980
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.04%
  lat (usec)   : 100=0.04%, 250=0.02%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.05%, 4=0.90%, 10=74.58%, 20=24.31%, 50=0.04%
  cpu          : usr=1.00%, sys=4.97%, ctx=82399686, majf=0, minf=3717209
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     issued rwts: total=149229295,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3238MiB/s (3395MB/s), 3238MiB/s-3238MiB/s (3395MB/s-3395MB/s), io=569GiB (611GB), run=180025-180025msec

Disk stats (read/write):
  rnbd0: ios=4661956/0, merge=0/0, ticks=10212115/0, in_queue=323800, util=99.67%
  rnbd1: ios=4661956/0, merge=0/0, ticks=10489817/0, in_queue=340940, util=99.70%
  rnbd2: ios=4661959/0, merge=0/0, ticks=10510042/0, in_queue=342180, util=99.70%
  rnbd3: ios=4661958/0, merge=0/0, ticks=10702745/0, in_queue=353350, util=99.73%
  rnbd4: ios=4661964/0, merge=0/0, ticks=10793914/0, in_queue=359190, util=99.74%
  rnbd5: ios=4661968/0, merge=0/0, ticks=10913714/0, in_queue=369150, util=99.74%
  rnbd6: ios=4661960/0, merge=0/0, ticks=10958094/0, in_queue=370730, util=99.74%
  rnbd7: ios=4661968/0, merge=0/0, ticks=10976320/0, in_queue=370090, util=99.76%
  rnbd8: ios=4661964/0, merge=0/0, ticks=11014804/0, in_queue=375780, util=99.79%
  rnbd9: ios=4661964/0, merge=0/0, ticks=11031969/0, in_queue=376760, util=99.80%
  rnbd10: ios=4661966/0, merge=0/0, ticks=11047729/0, in_queue=375450, util=99.81%
  rnbd11: ios=4661975/0, merge=0/0, ticks=11053595/0, in_queue=378140, util=99.83%
  rnbd12: ios=4661972/0, merge=1/0, ticks=11087759/0, in_queue=376570, util=99.83%
  rnbd13: ios=4661975/0, merge=0/0, ticks=11066221/0, in_queue=381940, util=99.85%
  rnbd14: ios=4661967/0, merge=0/0, ticks=11092973/0, in_queue=381730, util=99.85%
  rnbd15: ios=4661981/0, merge=0/0, ticks=11056803/0, in_queue=382830, util=99.86%
  rnbd16: ios=4661985/0, merge=0/0, ticks=9447901/0, in_queue=280700, util=99.89%
  rnbd17: ios=4661978/0, merge=0/0, ticks=10506961/0, in_queue=348500, util=99.90%
  rnbd18: ios=4661983/0, merge=0/0, ticks=10702411/0, in_queue=364060, util=99.92%
  rnbd19: ios=4661977/0, merge=0/0, ticks=10777160/0, in_queue=374250, util=99.92%
  rnbd20: ios=4661982/0, merge=0/0, ticks=10780637/0, in_queue=371820, util=99.93%
  rnbd21: ios=4661980/0, merge=0/0, ticks=10841533/0, in_queue=376750, util=99.95%
  rnbd22: ios=4661984/0, merge=0/0, ticks=10869817/0, in_queue=378430, util=99.95%
  rnbd23: ios=4661985/0, merge=0/0, ticks=10966341/0, in_queue=387410, util=99.96%
  rnbd24: ios=4661987/0, merge=0/0, ticks=10957613/0, in_queue=390960, util=99.96%
  rnbd25: ios=4661988/0, merge=0/0, ticks=11015585/0, in_queue=390920, util=99.97%
  rnbd26: ios=4661980/0, merge=0/0, ticks=11074411/0, in_queue=398090, util=100.00%
  rnbd27: ios=4661985/0, merge=0/0, ticks=11122911/0, in_queue=404760, util=100.00%
  rnbd28: ios=4661993/0, merge=0/0, ticks=11095077/0, in_queue=402480, util=100.00%
  rnbd29: ios=4661991/0, merge=0/0, ticks=11170485/0, in_queue=408370, util=100.00%
  rnbd30: ios=4661992/0, merge=0/0, ticks=11213819/0, in_queue=409730, util=100.00%
  rnbd31: ios=4661989/0, merge=0/0, ticks=11263063/0, in_queue=420640, util=100.00%


root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev_128proc.sh
fio start   : Di 13. Apr 12:42:42 UTC 2021
kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux
fio version : fio-3.12
gcc: gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Start fio test
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.12
Starting 128 processes
Jobs: 123 (f=2946): [r(3),f(1),r(8),f(1),r(4),f(2),r(1),f(2),r(1),f(1),r(1),f(1),r(5),f(1),r(1),f(1),r(3),_(2),r(2),f(2),r(4),f(2),r(5),f(2),r(6),f(3),r(1),f(2),r(1),_(1),r(7),f(1),r(2),f(1),r(5),f(1),r(5),f
(5),r(1),f(1),r(1),_(2),f(1),r(2),f(1),r(22)][16.8%][r=3190MiB/s][r=817k IOPS][eta 15m:00s]
fiotest: (groupid=0, jobs=128): err= 0: pid=39254: Tue Apr 13 12:45:45 2021
  read: IOPS=817k, BW=3191MiB/s (3346MB/s)(561GiB/180029msec)
    slat (usec): min=69, max=412616, avg=7725.43, stdev=5595.18
    clat (nsec): min=1054, max=39850k, avg=12233217.00, stdev=3542424.84
     lat (usec): min=179, max=421190, avg=19958.71, stdev=5791.92
    clat percentiles (usec):
     |  1.00th=[ 4555],  5.00th=[ 6849], 10.00th=[ 7963], 20.00th=[ 9372],
     | 30.00th=[10290], 40.00th=[11207], 50.00th=[11994], 60.00th=[12911],
     | 70.00th=[13829], 80.00th=[15008], 90.00th=[16712], 95.00th=[18220],
     | 99.00th=[21627], 99.50th=[22938], 99.90th=[26084], 99.95th=[27395],
     | 99.99th=[30540]
   bw (  KiB/s): min= 3072, max=146432, per=0.78%, avg=25495.37, stdev=2758.09, samples=45992
   iops        : min=  768, max=36608, avg=6373.81, stdev=689.53, samples=45992
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.02%
  lat (usec)   : 100=0.02%, 250=0.03%, 500=0.04%, 750=0.03%, 1000=0.03%
  lat (msec)   : 2=0.09%, 4=0.43%, 10=25.69%, 20=71.33%, 50=2.29%
  cpu          : usr=0.57%, sys=2.20%, ctx=75748318, majf=0, minf=6305149
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.1%, >=64=100.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.1%, >=64=100.0%
     issued rwts: total=147045041,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3191MiB/s (3346MB/s), 3191MiB/s-3191MiB/s (3346MB/s-3346MB/s), io=561GiB (602GB), run=180029-180029msec


Disk stats (read/write):
  rnbd0: ios=4589511/0, merge=0/0, ticks=13423743/0, in_queue=1931790, util=99.58%
  rnbd1: ios=4589508/0, merge=0/0, ticks=14476127/0, in_queue=2228970, util=99.60%
  rnbd2: ios=4589517/0, merge=0/0, ticks=14624196/0, in_queue=2271880, util=99.61%
  rnbd3: ios=4589504/0, merge=0/0, ticks=14686013/0, in_queue=2286760, util=99.63%
  rnbd4: ios=4595158/0, merge=0/0, ticks=14772465/0, in_queue=2316590, util=99.62%
  rnbd5: ios=4595158/0, merge=0/0, ticks=14805361/0, in_queue=2314670, util=99.65%
  rnbd6: ios=4595158/0, merge=0/0, ticks=14817116/0, in_queue=2324300, util=99.65%
  rnbd7: ios=4595158/0, merge=0/0, ticks=14833164/0, in_queue=2320360, util=99.66%
  rnbd8: ios=4595158/0, merge=0/0, ticks=14900960/0, in_queue=2340200, util=99.68%
  rnbd9: ios=4595158/0, merge=0/0, ticks=14917077/0, in_queue=2345260, util=99.70%
  rnbd10: ios=4595158/0, merge=0/0, ticks=14931826/0, in_queue=2344540, util=99.71%
  rnbd11: ios=4595158/0, merge=0/0, ticks=14963132/0, in_queue=2345350, util=99.72%
  rnbd12: ios=4595158/0, merge=0/0, ticks=14978944/0, in_queue=2371930, util=99.73%
  rnbd13: ios=4595158/0, merge=0/0, ticks=14953823/0, in_queue=2349200, util=99.75%
  rnbd14: ios=4595157/0, merge=0/0, ticks=14991909/0, in_queue=2361030, util=99.75%
  rnbd15: ios=4595157/0, merge=0/0, ticks=15039741/0, in_queue=2379400, util=99.76%
  rnbd16: ios=4595157/0, merge=0/0, ticks=15057599/0, in_queue=2387550, util=99.79%
  rnbd17: ios=4595157/0, merge=0/0, ticks=15052981/0, in_queue=2378570, util=99.80%
  rnbd18: ios=4595157/0, merge=0/0, ticks=15364367/0, in_queue=2455030, util=99.83%
  rnbd19: ios=4595157/0, merge=0/0, ticks=15369998/0, in_queue=2462130, util=99.84%
  rnbd20: ios=4595157/0, merge=0/0, ticks=14953262/0, in_queue=2354080, util=99.84%
  rnbd21: ios=4595157/0, merge=0/0, ticks=15116061/0, in_queue=2404290, util=99.86%
  rnbd22: ios=4595157/0, merge=0/0, ticks=15190489/0, in_queue=2419870, util=99.86%
  rnbd23: ios=4595157/0, merge=0/0, ticks=15212165/0, in_queue=2414980, util=99.88%
  rnbd24: ios=4595157/0, merge=0/0, ticks=15225716/0, in_queue=2429100, util=99.88%
  rnbd25: ios=4595157/0, merge=0/0, ticks=15239578/0, in_queue=2428240, util=99.89%
  rnbd26: ios=4595157/0, merge=0/0, ticks=15251628/0, in_queue=2427670, util=99.92%
  rnbd27: ios=4595157/0, merge=0/0, ticks=13955168/0, in_queue=2127300, util=99.93%
  rnbd28: ios=4595157/0, merge=0/0, ticks=14694941/0, in_queue=2329440, util=99.95%
  rnbd29: ios=4595157/0, merge=0/0, ticks=14804318/0, in_queue=2355090, util=99.96%
  rnbd30: ios=4595157/0, merge=0/0, ticks=15183672/0, in_queue=2421440, util=99.96%
  rnbd31: ios=4595157/0, merge=0/0, ticks=11575822/0, in_queue=1492750, util=99.98%
  
  
root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev_128proc.sh
fio start   : Di 13. Apr 12:51:40 UTC 2021
kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux
fio version : fio-3.12
gcc: gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Start fio test
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.12
Starting 128 processes
Jobs: 70 (f=470): [f(4),_(1),f(2),_(1),f(1),_(1),f(1),_(1),f(1),_(5),f(1),_(2),f(1),_(1),f(2),_(3),f(2),_(1),f(3),r(1),_(4),f(1),_(1),f(2),r(1),_(1),f(2),_(1),f(1),_(1),f(1),_(5),f(3),_(2),f(4),_(5),f(1),_(1
),f(1),_(2),f(3),_(2),f(1),_(1),f(1),_(1),f(2),_(1),r(1),_(1),f(5),_(1),f(1),_(3),f(5),_(1),f(1),_(2),f(1),_(1),f(1),_(2),f(9),_(1),f(2),_(1),f(1),_(1)][1.7%][r=3210MiB/s][r=822k IOPS][eta 02h:54m:00s]
fiotest: (groupid=0, jobs=128): err= 0: pid=40166: Tue Apr 13 12:54:43 2021
  read: IOPS=817k, BW=3193MiB/s (3348MB/s)(561GiB/180023msec)
    slat (usec): min=7, max=292298, avg=7787.34, stdev=5758.03
    clat (nsec): min=1586, max=50911k, avg=12167070.96, stdev=3539907.52
     lat (usec): min=206, max=300176, avg=19954.48, stdev=5924.10
    clat percentiles (usec):
     |  1.00th=[ 4490],  5.00th=[ 6783], 10.00th=[ 7898], 20.00th=[ 9241],
     | 30.00th=[10290], 40.00th=[11076], 50.00th=[11994], 60.00th=[12780],
     | 70.00th=[13829], 80.00th=[15008], 90.00th=[16712], 95.00th=[18220],
     | 99.00th=[21627], 99.50th=[22938], 99.90th=[26084], 99.95th=[27395],
     | 99.99th=[30540]
   bw (  KiB/s): min=10240, max=156672, per=0.78%, avg=25505.77, stdev=3163.58, samples=45970
   iops        : min= 2560, max=39168, avg=6376.42, stdev=790.91, samples=45970
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.02%
  lat (usec)   : 100=0.02%, 250=0.02%, 500=0.02%, 750=0.02%, 1000=0.02%
  lat (msec)   : 2=0.11%, 4=0.49%, 10=26.37%, 20=70.68%, 50=2.22%
  lat (msec)   : 100=0.01%
  cpu          : usr=0.55%, sys=2.16%, ctx=75466111, majf=0, minf=5747705
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=0.1%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     complete  : 0=0.0%, 4=0.1%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
     issued rwts: total=147144833,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3193MiB/s (3348MB/s), 3193MiB/s-3193MiB/s (3348MB/s-3348MB/s), io=561GiB (603GB), run=180023-180023msec

Disk stats (read/write):
  rnbd0: ios=4598269/0, merge=0/0, ticks=14961482/0, in_queue=2294850, util=99.60%
Leon Romanovsky April 13, 2021, 7:31 p.m. UTC | #11
On Tue, Apr 13, 2021 at 03:11:33PM +0200, Gioh Kim wrote:
> On Tue, Apr 13, 2021 at 8:43 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Apr 13, 2021 at 05:31:24AM +0000, Haakon Bugge wrote:
> > >
> > >
> > > > On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote:
> > > >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote:
> > > >>>
> > > >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >>>>
> > > >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote:
> > > >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >>>>>>
> > > >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> > > >>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> > > >>>>>>>
> > > >>>>>>> Client prints only error value and it is not enough for debugging.
> > > >>>>>>>
> > > >>>>>>> 1. When client receives an error from server:
> > > >>>>>>> the client does not only print the error value but also
> > > >>>>>>> more information of server connection.
> > > >>>>>>>
> > > >>>>>>> 2. When client failes to send IO:
> > > >>>>>>> the client gets an error from RDMA layer. It also
> > > >>>>>>> print more information of server connection.
> > > >>>>>>>
> > > >>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> > > >>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
> > > >>>>>>> ---
> > > >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
> > > >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-)
> > > >>>>>>>
> > > >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > >>>>>>> index 5062328ac577..a534b2b09e13 100644
> > > >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> > > >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
> > > >>>>>>>      req->in_use = false;
> > > >>>>>>>      req->con = NULL;
> > > >>>>>>>
> > > >>>>>>> +     if (unlikely(errno)) {
> > > >>>>>>
> > > >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo
> > > >>>>>> cult. Can you please provide supportive performance data or delete all
> > > >>>>>> likely/unlikely in all rtrs code?
> > > >>>>>
> > > >>>>> Hi Leon,
> > > >>>>>
> > > >>>>> All the likely/unlikely from the non-fast path was removed as you
> > > >>>>> suggested in the past.
> > > >>>>> This one is on IO path, my understanding is for the fast path, with
> > > >>>>> likely/unlikely macro,
> > > >>>>> the compiler will optimize the code for better branch prediction.
> > > >>>>
> > > >>>> In theory yes, in practice. gcc 10 generated same assembly code when I
> > > >>>> placed likely() and replaced it with unlikely() later.
> > > >>
> > > >> Even-thought gcc 10 generated the same assembly code,
> > > >> there is no guarantee for gcc 11 or gcc 12.
> > > >>
> > > >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely.
> > > >> But I think likely/unlikely are necessary for extreme cases.
> > > >> I will have a discussion with my colleagues and inform you of the result.
> > > >
> > > > Please come with performance data.
> > >
> > > I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any.
> >
> > In theory, it will multiply by 2 gain/loss, which is nice to see if
> > likely/ulikely change something.
> >
> > Thanks
> >
> > >
> > >
> > > Thxs, Håkon
> > >
> 
> Hi,
> 
> In summary, there is no performance gap before/after swapping
> likely/unlikely macros.
> So I will send a patch to remove all likely/unlikely macros.
> 
> I guess that is because
> - The performance of rnbd/rtrs depends on the network and block layer.
> - The network and block layer are not fast enough to get impacted by
> likely/unlikely.

Thanks for sharing this data. Your input can't truly randomize the code
path execution flows and your instructions cache was filled "correctly".
It was expected.

In most cases, the likely/unlikely is not needed.

Thanks
Jason Gunthorpe April 13, 2021, 10:52 p.m. UTC | #12
On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote:
> From: Gioh Kim <gi-oh.kim@cloud.ionos.com>
> 
> Client prints only error value and it is not enough for debugging.
> 
> 1. When client receives an error from server:
> the client does not only print the error value but also
> more information of server connection.
> 
> 2. When client failes to send IO:
> the client gets an error from RDMA layer. It also
> print more information of server connection.
> 
> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
> Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
>  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++----
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> index 5062328ac577..a534b2b09e13 100644
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
>  	req->in_use = false;
>  	req->con = NULL;
>  
> +	if (unlikely(errno)) {
> +		rtrs_err_rl(con->c.sess, "IO request failed: error=%d path=%s [%s:%u]\n",
> +			    errno, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);

Mind the long lines, I fixed them and removed the unlikely

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 5062328ac577..a534b2b09e13 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -437,6 +437,11 @@  static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno,
 	req->in_use = false;
 	req->con = NULL;
 
+	if (unlikely(errno)) {
+		rtrs_err_rl(con->c.sess, "IO request failed: error=%d path=%s [%s:%u]\n",
+			    errno, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
+	}
+
 	if (notify)
 		req->conf(req->priv, errno);
 }
@@ -1020,7 +1025,8 @@  static int rtrs_clt_write_req(struct rtrs_clt_io_req *req)
 				       req->usr_len + sizeof(*msg),
 				       imm);
 	if (unlikely(ret)) {
-		rtrs_err(s, "Write request failed: %d\n", ret);
+		rtrs_err_rl(s, "Write request failed: error=%d path=%s [%s:%u]\n",
+			    ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
 		if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
 			atomic_dec(&sess->stats->inflight);
 		if (req->sg_cnt)
@@ -1138,7 +1144,8 @@  static int rtrs_clt_read_req(struct rtrs_clt_io_req *req)
 	ret = rtrs_post_send_rdma(req->con, req, &sess->rbufs[buf_id],
 				   req->data_len, imm, wr);
 	if (unlikely(ret)) {
-		rtrs_err(s, "Read request failed: %d\n", ret);
+		rtrs_err_rl(s, "Read request failed: error=%d path=%s [%s:%u]\n",
+			    ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port);
 		if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT)
 			atomic_dec(&sess->stats->inflight);
 		req->need_inv = false;
@@ -2459,12 +2466,30 @@  static int init_sess(struct rtrs_clt_sess *sess)
 	mutex_lock(&sess->init_mutex);
 	err = init_conns(sess);
 	if (err) {
-		rtrs_err(sess->clt, "init_conns(), err: %d\n", err);
+		char str[NAME_MAX];
+		int err;
+		struct rtrs_addr path = {
+			.src = &sess->s.src_addr,
+			.dst = &sess->s.dst_addr,
+		};
+
+		rtrs_addr_to_str(&path, str, sizeof(str));
+		rtrs_err(sess->clt, "init_conns() failed: err=%d path=%s [%s:%u]\n",
+			 err, str, sess->hca_name, sess->hca_port);
 		goto out;
 	}
 	err = rtrs_send_sess_info(sess);
 	if (err) {
-		rtrs_err(sess->clt, "rtrs_send_sess_info(), err: %d\n", err);
+		char str[NAME_MAX];
+		int err;
+		struct rtrs_addr path = {
+			.src = &sess->s.src_addr,
+			.dst = &sess->s.dst_addr,
+		};
+
+		rtrs_addr_to_str(&path, str, sizeof(str));
+		rtrs_err(sess->clt, "rtrs_send_sess_info() failed: err=%d path=%s [%s:%u]\n",
+			 err, str, sess->hca_name, sess->hca_port);
 		goto out;
 	}
 	rtrs_clt_sess_up(sess);