Message ID | 20210406123639.202899-2-gi-oh.kim@ionos.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | Improve debugging messages | expand |
On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > Client prints only error value and it is not enough for debugging. > > 1. When client receives an error from server: > the client does not only print the error value but also > more information of server connection. > > 2. When client failes to send IO: > the client gets an error from RDMA layer. It also > print more information of server connection. > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > --- > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > index 5062328ac577..a534b2b09e13 100644 > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > req->in_use = false; > req->con = NULL; > > + if (unlikely(errno)) { I'm sorry, but all your patches are full of these likely/unlikely cargo cult. Can you please provide supportive performance data or delete all likely/unlikely in all rtrs code? Thanks
On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > > > Client prints only error value and it is not enough for debugging. > > > > 1. When client receives an error from server: > > the client does not only print the error value but also > > more information of server connection. > > > > 2. When client failes to send IO: > > the client gets an error from RDMA layer. It also > > print more information of server connection. > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > --- > > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > index 5062328ac577..a534b2b09e13 100644 > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > req->in_use = false; > > req->con = NULL; > > > > + if (unlikely(errno)) { > > I'm sorry, but all your patches are full of these likely/unlikely cargo > cult. Can you please provide supportive performance data or delete all > likely/unlikely in all rtrs code? > > Thanks Hi Leon, Let me check my colleagues if there is any data about it. I will inform you soon.
On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > > > Client prints only error value and it is not enough for debugging. > > > > 1. When client receives an error from server: > > the client does not only print the error value but also > > more information of server connection. > > > > 2. When client failes to send IO: > > the client gets an error from RDMA layer. It also > > print more information of server connection. > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > --- > > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > index 5062328ac577..a534b2b09e13 100644 > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > req->in_use = false; > > req->con = NULL; > > > > + if (unlikely(errno)) { > > I'm sorry, but all your patches are full of these likely/unlikely cargo > cult. Can you please provide supportive performance data or delete all > likely/unlikely in all rtrs code? Hi Leon, All the likely/unlikely from the non-fast path was removed as you suggested in the past. This one is on IO path, my understanding is for the fast path, with likely/unlikely macro, the compiler will optimize the code for better branch prediction. We will run some benchmarks to see if it makes a difference. Thanks > > Thanks
On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > > > > > Client prints only error value and it is not enough for debugging. > > > > > > 1. When client receives an error from server: > > > the client does not only print the error value but also > > > more information of server connection. > > > > > > 2. When client failes to send IO: > > > the client gets an error from RDMA layer. It also > > > print more information of server connection. > > > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > > --- > > > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > index 5062328ac577..a534b2b09e13 100644 > > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > > req->in_use = false; > > > req->con = NULL; > > > > > > + if (unlikely(errno)) { > > > > I'm sorry, but all your patches are full of these likely/unlikely cargo > > cult. Can you please provide supportive performance data or delete all > > likely/unlikely in all rtrs code? > > Hi Leon, > > All the likely/unlikely from the non-fast path was removed as you > suggested in the past. > This one is on IO path, my understanding is for the fast path, with > likely/unlikely macro, > the compiler will optimize the code for better branch prediction. In theory yes, in practice. gcc 10 generated same assembly code when I placed likely() and replaced it with unlikely() later. > > We will run some benchmarks to see if it makes a difference. > > Thanks > > > > Thanks
On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > > On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > > > > > > > Client prints only error value and it is not enough for debugging. > > > > > > > > 1. When client receives an error from server: > > > > the client does not only print the error value but also > > > > more information of server connection. > > > > > > > > 2. When client failes to send IO: > > > > the client gets an error from RDMA layer. It also > > > > print more information of server connection. > > > > > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > > > --- > > > > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > index 5062328ac577..a534b2b09e13 100644 > > > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > > > req->in_use = false; > > > > req->con = NULL; > > > > > > > > + if (unlikely(errno)) { > > > > > > I'm sorry, but all your patches are full of these likely/unlikely cargo > > > cult. Can you please provide supportive performance data or delete all > > > likely/unlikely in all rtrs code? > > > > Hi Leon, > > > > All the likely/unlikely from the non-fast path was removed as you > > suggested in the past. > > This one is on IO path, my understanding is for the fast path, with > > likely/unlikely macro, > > the compiler will optimize the code for better branch prediction. > > In theory yes, in practice. gcc 10 generated same assembly code when I > placed likely() and replaced it with unlikely() later. That's a surprise to me. Just checked, min gcc requirement is 4.9[1], debian Buster is using gcc 8.3, upcoming Bullseye will use gcc 10.2 [1]: https://www.kernel.org/doc/html/latest/process/changes.html > > > > > We will run some benchmarks to see if it makes a difference. > > > > Thanks > > > > > > Thanks
On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote: > > On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > > > On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > > > > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > > > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > > > > > > > > > Client prints only error value and it is not enough for debugging. > > > > > > > > > > 1. When client receives an error from server: > > > > > the client does not only print the error value but also > > > > > more information of server connection. > > > > > > > > > > 2. When client failes to send IO: > > > > > the client gets an error from RDMA layer. It also > > > > > print more information of server connection. > > > > > > > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > > > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > > > > --- > > > > > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > > > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > index 5062328ac577..a534b2b09e13 100644 > > > > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > > > > req->in_use = false; > > > > > req->con = NULL; > > > > > > > > > > + if (unlikely(errno)) { > > > > > > > > I'm sorry, but all your patches are full of these likely/unlikely cargo > > > > cult. Can you please provide supportive performance data or delete all > > > > likely/unlikely in all rtrs code? > > > > > > Hi Leon, > > > > > > All the likely/unlikely from the non-fast path was removed as you > > > suggested in the past. > > > This one is on IO path, my understanding is for the fast path, with > > > likely/unlikely macro, > > > the compiler will optimize the code for better branch prediction. > > > > In theory yes, in practice. gcc 10 generated same assembly code when I > > placed likely() and replaced it with unlikely() later. Even-thought gcc 10 generated the same assembly code, there is no guarantee for gcc 11 or gcc 12. I am reviewing rtrs source file and have found some unnecessary likely/unlikely. But I think likely/unlikely are necessary for extreme cases. I will have a discussion with my colleagues and inform you of the result.
On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote: > On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote: > > > > On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > > > On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > > > > On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > > > > > > > On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > > > > > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > > > > > > > > > > > Client prints only error value and it is not enough for debugging. > > > > > > > > > > > > 1. When client receives an error from server: > > > > > > the client does not only print the error value but also > > > > > > more information of server connection. > > > > > > > > > > > > 2. When client failes to send IO: > > > > > > the client gets an error from RDMA layer. It also > > > > > > print more information of server connection. > > > > > > > > > > > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > > > > > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > > > > > --- > > > > > > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > > > > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > > index 5062328ac577..a534b2b09e13 100644 > > > > > > --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > > > > > req->in_use = false; > > > > > > req->con = NULL; > > > > > > > > > > > > + if (unlikely(errno)) { > > > > > > > > > > I'm sorry, but all your patches are full of these likely/unlikely cargo > > > > > cult. Can you please provide supportive performance data or delete all > > > > > likely/unlikely in all rtrs code? > > > > > > > > Hi Leon, > > > > > > > > All the likely/unlikely from the non-fast path was removed as you > > > > suggested in the past. > > > > This one is on IO path, my understanding is for the fast path, with > > > > likely/unlikely macro, > > > > the compiler will optimize the code for better branch prediction. > > > > > > In theory yes, in practice. gcc 10 generated same assembly code when I > > > placed likely() and replaced it with unlikely() later. > > Even-thought gcc 10 generated the same assembly code, > there is no guarantee for gcc 11 or gcc 12. > > I am reviewing rtrs source file and have found some unnecessary likely/unlikely. > But I think likely/unlikely are necessary for extreme cases. > I will have a discussion with my colleagues and inform you of the result. Please come with performance data. Thanks
> On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote: > > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote: >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote: >>> >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: >>>> >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: >>>>>> >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: >>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com> >>>>>>> >>>>>>> Client prints only error value and it is not enough for debugging. >>>>>>> >>>>>>> 1. When client receives an error from server: >>>>>>> the client does not only print the error value but also >>>>>>> more information of server connection. >>>>>>> >>>>>>> 2. When client failes to send IO: >>>>>>> the client gets an error from RDMA layer. It also >>>>>>> print more information of server connection. >>>>>>> >>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> >>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> >>>>>>> --- >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>>> index 5062328ac577..a534b2b09e13 100644 >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, >>>>>>> req->in_use = false; >>>>>>> req->con = NULL; >>>>>>> >>>>>>> + if (unlikely(errno)) { >>>>>> >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo >>>>>> cult. Can you please provide supportive performance data or delete all >>>>>> likely/unlikely in all rtrs code? >>>>> >>>>> Hi Leon, >>>>> >>>>> All the likely/unlikely from the non-fast path was removed as you >>>>> suggested in the past. >>>>> This one is on IO path, my understanding is for the fast path, with >>>>> likely/unlikely macro, >>>>> the compiler will optimize the code for better branch prediction. >>>> >>>> In theory yes, in practice. gcc 10 generated same assembly code when I >>>> placed likely() and replaced it with unlikely() later. >> >> Even-thought gcc 10 generated the same assembly code, >> there is no guarantee for gcc 11 or gcc 12. >> >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely. >> But I think likely/unlikely are necessary for extreme cases. >> I will have a discussion with my colleagues and inform you of the result. > > Please come with performance data. I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any. Thxs, Håkon
On Tue, Apr 13, 2021 at 05:31:24AM +0000, Haakon Bugge wrote: > > > > On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote: > > > > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote: > >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote: > >>> > >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > >>>> > >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > >>>>>> > >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > >>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > >>>>>>> > >>>>>>> Client prints only error value and it is not enough for debugging. > >>>>>>> > >>>>>>> 1. When client receives an error from server: > >>>>>>> the client does not only print the error value but also > >>>>>>> more information of server connection. > >>>>>>> > >>>>>>> 2. When client failes to send IO: > >>>>>>> the client gets an error from RDMA layer. It also > >>>>>>> print more information of server connection. > >>>>>>> > >>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > >>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > >>>>>>> --- > >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > >>>>>>> index 5062328ac577..a534b2b09e13 100644 > >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > >>>>>>> req->in_use = false; > >>>>>>> req->con = NULL; > >>>>>>> > >>>>>>> + if (unlikely(errno)) { > >>>>>> > >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo > >>>>>> cult. Can you please provide supportive performance data or delete all > >>>>>> likely/unlikely in all rtrs code? > >>>>> > >>>>> Hi Leon, > >>>>> > >>>>> All the likely/unlikely from the non-fast path was removed as you > >>>>> suggested in the past. > >>>>> This one is on IO path, my understanding is for the fast path, with > >>>>> likely/unlikely macro, > >>>>> the compiler will optimize the code for better branch prediction. > >>>> > >>>> In theory yes, in practice. gcc 10 generated same assembly code when I > >>>> placed likely() and replaced it with unlikely() later. > >> > >> Even-thought gcc 10 generated the same assembly code, > >> there is no guarantee for gcc 11 or gcc 12. > >> > >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely. > >> But I think likely/unlikely are necessary for extreme cases. > >> I will have a discussion with my colleagues and inform you of the result. > > > > Please come with performance data. > > I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any. In theory, it will multiply by 2 gain/loss, which is nice to see if likely/ulikely change something. Thanks > > > Thxs, Håkon >
On Tue, Apr 13, 2021 at 8:43 AM Leon Romanovsky <leon@kernel.org> wrote: > > On Tue, Apr 13, 2021 at 05:31:24AM +0000, Haakon Bugge wrote: > > > > > > > On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote: > > > > > > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote: > > >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote: > > >>> > > >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > >>>> > > >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > > >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > >>>>>> > > >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > >>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > >>>>>>> > > >>>>>>> Client prints only error value and it is not enough for debugging. > > >>>>>>> > > >>>>>>> 1. When client receives an error from server: > > >>>>>>> the client does not only print the error value but also > > >>>>>>> more information of server connection. > > >>>>>>> > > >>>>>>> 2. When client failes to send IO: > > >>>>>>> the client gets an error from RDMA layer. It also > > >>>>>>> print more information of server connection. > > >>>>>>> > > >>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > >>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > >>>>>>> --- > > >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-) > > >>>>>>> > > >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > >>>>>>> index 5062328ac577..a534b2b09e13 100644 > > >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > >>>>>>> req->in_use = false; > > >>>>>>> req->con = NULL; > > >>>>>>> > > >>>>>>> + if (unlikely(errno)) { > > >>>>>> > > >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo > > >>>>>> cult. Can you please provide supportive performance data or delete all > > >>>>>> likely/unlikely in all rtrs code? > > >>>>> > > >>>>> Hi Leon, > > >>>>> > > >>>>> All the likely/unlikely from the non-fast path was removed as you > > >>>>> suggested in the past. > > >>>>> This one is on IO path, my understanding is for the fast path, with > > >>>>> likely/unlikely macro, > > >>>>> the compiler will optimize the code for better branch prediction. > > >>>> > > >>>> In theory yes, in practice. gcc 10 generated same assembly code when I > > >>>> placed likely() and replaced it with unlikely() later. > > >> > > >> Even-thought gcc 10 generated the same assembly code, > > >> there is no guarantee for gcc 11 or gcc 12. > > >> > > >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely. > > >> But I think likely/unlikely are necessary for extreme cases. > > >> I will have a discussion with my colleagues and inform you of the result. > > > > > > Please come with performance data. > > > > I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any. > > In theory, it will multiply by 2 gain/loss, which is nice to see if > likely/ulikely change something. > > Thanks > > > > > > > Thxs, Håkon > > Hi, In summary, there is no performance gap before/after swapping likely/unlikely macros. So I will send a patch to remove all likely/unlikely macros. I guess that is because - The performance of rnbd/rtrs depends on the network and block layer. - The network and block layer are not fast enough to get impacted by likely/unlikely. I ran fio read test with 32 rnbd devices and 64/128 processes on 64-CORE server. The fio generated the exact same result before and after the swapping. Thanks to Håkon for the test idea. Test environment: - Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz - 376G memory - kernel version: 5.4.86 - gcc version: gcc (Debian 8.3.0-6) 8.3.0 - Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] Test result: - before swapping: 32-dev/64-proc: IOPS=829k, BW=3239MiB/s 32-dev/128-proc: IOPS=816k, BW=3187MiB/s - after swapping 32-dev/64-proc: IOPS=829k, BW=3238MiB/s 32-dev/128-proc: IOPS=817k, BW=3191MiB/s (128-proc is worse than 64-proc but that is another issue) Attached files: - 0001-swap-likely-and-unlikely.patch: a patch file swapping likely and unlikely to show how I tested - after_swap.txt: raw data after swapping - current.txt: raw data before swapping For your information, I ran the performance test on two 8-core desktop machines that are directly linked by Infiniband cables without switch. I got the same result with them: no performance difference. 141 root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# git reset --hard HEAD~2 HEAD is now at 99c7c2f RDMA/rtrs-clt: destroy sysfs after removing session from active list root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# make clean make[1]: Entering directory '/usr/src/linux-5.4.86-pserver' make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver' root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# make make[1]: Entering directory '/usr/src/linux-5.4.86-pserver' CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt-sysfs.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-common.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-dev.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-sysfs.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-stats.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-sysfs.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-stats.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-sysfs.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.o AR /tmp/ddd/gkim/ibnbd2/built-in.a Building modules, stage 2. MODPOST 5 modules CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.ko CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.ko CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.ko CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.ko CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.ko make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver' root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo rmmod rnbd-client root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo rmmod rtrs-client root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo rmmod rtrs-core root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo insmod rtrs/rtrs-core.ko root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo insmod rtrs/rtrs-client.ko root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# sudo insmod rnbd/rnbd-client.ko root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev.sh fio start : Di 13. Apr 10:38:09 UTC 2021 kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux fio version : fio-3.12 gcc: gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Start fio test fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 ... fio-3.12 Starting 64 processes Jobs: 64 (f=2048): [r(64)][0.1%][r=3233MiB/s][r=828k IOPS][eta 01d:21h:06m:50s] fiotest: (groupid=0, jobs=64): err= 0: pid=23219: Tue Apr 13 10:41:11 2021 read: IOPS=829k, BW=3239MiB/s (3396MB/s)(569GiB/180011msec) slat (usec): min=82, max=156993, avg=1278.01, stdev=1683.01 clat (nsec): min=1096, max=30634k, avg=8538323.42, stdev=2411851.29 lat (usec): min=317, max=162011, avg=9816.39, stdev=2702.04 clat percentiles (usec): | 1.00th=[ 3949], 5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6587], | 30.00th=[ 7111], 40.00th=[ 7701], 50.00th=[ 8225], 60.00th=[ 8848], | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11731], 95.00th=[12911], | 99.00th=[15270], 99.50th=[16319], 99.90th=[18482], 99.95th=[19530], | 99.99th=[21890] bw ( KiB/s): min=36864, max=166912, per=1.56%, avg=51773.23, stdev=3159.85, samples=22980 iops : min= 9216, max=41728, avg=12943.26, stdev=789.95, samples=22980 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (usec) : 100=0.01%, 250=0.04%, 500=0.04%, 750=0.02%, 1000=0.02% lat (msec) : 2=0.06%, 4=0.90%, 10=74.65%, 20=24.23%, 50=0.03% cpu : usr=0.97%, sys=4.95%, ctx=82650519, majf=0, minf=3555539 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=100.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=100.0% issued rwts: total=149252805,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=3239MiB/s (3396MB/s), 3239MiB/s-3239MiB/s (3396MB/s-3396MB/s), io=569GiB (611GB), run=180011-180011msec Disk stats (read/write): rnbd0: ios=4663183/0, merge=0/0, ticks=10123475/0, in_queue=314090, util=99.62% rnbd1: ios=4663180/0, merge=0/0, ticks=10445268/0, in_queue=334450, util=99.65% rnbd2: ios=4663181/0, merge=0/0, ticks=10631138/0, in_queue=346090, util=99.65% rnbd3: ios=4663179/0, merge=0/0, ticks=10791046/0, in_queue=350590, util=99.67% rnbd4: ios=4663175/0, merge=0/0, ticks=10888045/0, in_queue=361960, util=99.68% rnbd5: ios=4663183/0, merge=0/0, ticks=10994697/0, in_queue=369540, util=99.70% rnbd6: ios=4663182/0, merge=0/0, ticks=11010726/0, in_queue=365730, util=99.71% rnbd7: ios=4663182/0, merge=0/0, ticks=11041009/0, in_queue=373870, util=99.71% rnbd8: ios=4663185/0, merge=0/0, ticks=11050961/0, in_queue=375140, util=99.74% rnbd9: ios=4663185/0, merge=0/0, ticks=11063691/0, in_queue=373870, util=99.75% rnbd10: ios=4663184/0, merge=0/0, ticks=11099119/0, in_queue=375340, util=99.76% rnbd11: ios=4663187/0, merge=0/0, ticks=11112331/0, in_queue=382300, util=99.78% rnbd12: ios=4663187/0, merge=1/0, ticks=11078851/0, in_queue=376050, util=99.79% rnbd13: ios=4663188/0, merge=0/0, ticks=11087422/0, in_queue=376040, util=99.80% rnbd14: ios=4663190/0, merge=0/0, ticks=11070282/0, in_queue=378500, util=99.80% rnbd15: ios=4663190/0, merge=0/0, ticks=9417418/0, in_queue=271060, util=99.81% rnbd16: ios=4663191/0, merge=0/0, ticks=10588441/0, in_queue=348800, util=99.84% rnbd17: ios=4663193/0, merge=0/0, ticks=10740263/0, in_queue=364650, util=99.84% rnbd18: ios=4663195/0, merge=0/0, ticks=10813752/0, in_queue=371990, util=99.86% rnbd19: ios=4663193/0, merge=0/0, ticks=10878352/0, in_queue=375050, util=99.87% rnbd20: ios=4663193/0, merge=0/0, ticks=10845686/0, in_queue=371010, util=99.88% rnbd21: ios=4663195/0, merge=0/0, ticks=10854889/0, in_queue=373940, util=99.90% rnbd22: ios=4663197/0, merge=0/0, ticks=10936251/0, in_queue=378890, util=99.90% rnbd23: ios=4663195/0, merge=0/0, ticks=11000989/0, in_queue=380360, util=99.92% rnbd24: ios=4663200/0, merge=0/0, ticks=11056302/0, in_queue=389300, util=99.93% rnbd25: ios=4663199/0, merge=0/0, ticks=11099625/0, in_queue=396820, util=99.93% rnbd26: ios=4663197/0, merge=0/0, ticks=11091101/0, in_queue=391310, util=99.95% rnbd27: ios=4663201/0, merge=0/0, ticks=11108242/0, in_queue=396440, util=99.96% rnbd28: ios=4663203/0, merge=0/0, ticks=11222083/0, in_queue=405730, util=100.00% rnbd29: ios=4663205/0, merge=0/0, ticks=11251353/0, in_queue=412810, util=99.99% rnbd30: ios=4663201/0, merge=0/0, ticks=11238249/0, in_queue=413260, util=100.00% rnbd31: ios=4663218/0, merge=0/0, ticks=11267469/0, in_queue=414620, util=100.00% root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev.sh fio start : Di 13. Apr 10:58:33 UTC 2021 kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux fio version : fio-3.12 gcc: gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Start fio test fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 ... fio-3.12 Starting 64 processes Jobs: 64 (f=2048): [r(64)][100.0%][r=3227MiB/s][r=826k IOPS][eta 00m:00s] fiotest: (groupid=0, jobs=64): err= 0: pid=25682: Tue Apr 13 11:01:35 2021 read: IOPS=826k, BW=3228MiB/s (3385MB/s)(568GiB/180012msec) slat (usec): min=170, max=131883, avg=1286.52, stdev=1678.50 clat (nsec): min=1357, max=31141k, avg=8561491.28, stdev=2417740.18 lat (usec): min=502, max=133262, avg=9848.07, stdev=2710.62 clat percentiles (usec): | 1.00th=[ 3949], 5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6587], | 30.00th=[ 7177], 40.00th=[ 7701], 50.00th=[ 8225], 60.00th=[ 8848], | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11863], 95.00th=[13042], | 99.00th=[15401], 99.50th=[16319], 99.90th=[18482], 99.95th=[19268], | 99.99th=[21890] bw ( KiB/s): min=32768, max=258048, per=1.56%, avg=51622.65, stdev=3500.81, samples=22982 iops : min= 8192, max=64512, avg=12905.63, stdev=875.20, samples=22982 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (usec) : 100=0.02%, 250=0.03%, 500=0.02%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.07%, 4=0.89%, 10=74.32%, 20=24.59%, 50=0.03% cpu : usr=0.96%, sys=5.02%, ctx=81774475, majf=0, minf=3197872 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% issued rwts: total=148772864,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=3228MiB/s (3385MB/s), 3228MiB/s-3228MiB/s (3385MB/s-3385MB/s), io=568GiB (609GB), run=180012-180012msec Disk stats (read/write): rnbd0: ios=4647869/0, merge=0/0, ticks=10157474/0, in_queue=328770, util=99.64% rnbd1: ios=4647859/0, merge=0/0, ticks=10417978/0, in_queue=340850, util=99.67% rnbd2: ios=4647863/0, merge=0/0, ticks=10623846/0, in_queue=357690, util=99.68% rnbd3: ios=4647875/0, merge=0/0, ticks=10762992/0, in_queue=364280, util=99.70% rnbd4: ios=4647869/0, merge=0/0, ticks=10870408/0, in_queue=371350, util=99.70% rnbd5: ios=4647866/0, merge=0/0, ticks=10955394/0, in_queue=377050, util=99.72% rnbd6: ios=4647875/0, merge=0/0, ticks=11010235/0, in_queue=383640, util=99.73% rnbd7: ios=4647873/0, merge=0/0, ticks=11032744/0, in_queue=385440, util=99.73% rnbd8: ios=4647871/0, merge=0/0, ticks=11043787/0, in_queue=384150, util=99.76% rnbd9: ios=4647877/0, merge=0/0, ticks=9415047/0, in_queue=281570, util=99.77% rnbd10: ios=4647881/0, merge=0/0, ticks=10427629/0, in_queue=348200, util=99.78% rnbd11: ios=4647876/0, merge=0/0, ticks=10615844/0, in_queue=362920, util=99.81% rnbd12: ios=4647872/0, merge=1/0, ticks=10638181/0, in_queue=366520, util=99.81% rnbd13: ios=4647882/0, merge=0/0, ticks=10678828/0, in_queue=368450, util=99.82% rnbd14: ios=4647874/0, merge=0/0, ticks=10678088/0, in_queue=370470, util=99.83% rnbd15: ios=4647887/0, merge=0/0, ticks=10702071/0, in_queue=369640, util=99.84% rnbd16: ios=4647875/0, merge=0/0, ticks=10710059/0, in_queue=374900, util=99.86% rnbd17: ios=4647883/0, merge=0/0, ticks=10761335/0, in_queue=378860, util=99.87% rnbd18: ios=4647884/0, merge=0/0, ticks=10784441/0, in_queue=379010, util=99.89% rnbd19: ios=4647882/0, merge=0/0, ticks=10828395/0, in_queue=380680, util=99.92% rnbd20: ios=4647897/0, merge=0/0, ticks=10856351/0, in_queue=384050, util=99.91% rnbd21: ios=4647898/0, merge=0/0, ticks=10889226/0, in_queue=385120, util=99.94% rnbd22: ios=4647890/0, merge=0/0, ticks=10915544/0, in_queue=389180, util=99.93% rnbd23: ios=4647888/0, merge=0/0, ticks=10922797/0, in_queue=392600, util=99.94% rnbd24: ios=4647891/0, merge=0/0, ticks=10964743/0, in_queue=395200, util=99.95% rnbd25: ios=4647894/0, merge=0/0, ticks=11017459/0, in_queue=408070, util=99.96% rnbd26: ios=4647896/0, merge=0/0, ticks=11084377/0, in_queue=405700, util=99.98% rnbd27: ios=4647893/0, merge=0/0, ticks=11108133/0, in_queue=407300, util=99.99% rnbd28: ios=4647905/0, merge=0/0, ticks=11153595/0, in_queue=416610, util=100.00% rnbd29: ios=4647899/0, merge=0/0, ticks=11213811/0, in_queue=420820, util=100.00% rnbd30: ios=4647903/0, merge=0/0, ticks=11228964/0, in_queue=421960, util=100.00% rnbd31: ios=4647904/0, merge=0/0, ticks=11277612/0, in_queue=424840, util=100.00% root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev.sh fio start : Di 13. Apr 11:04:17 UTC 2021 kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux fio version : fio-3.12 gcc: gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Start fio test fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 ... fio-3.12 Starting 64 processes Jobs: 64 (f=2048): [r(64)][100.0%][r=3239MiB/s][r=829k IOPS][eta 00m:00s] fiotest: (groupid=0, jobs=64): err= 0: pid=26523: Tue Apr 13 11:07:19 2021 read: IOPS=829k, BW=3238MiB/s (3395MB/s)(569GiB/180019msec) slat (usec): min=168, max=172647, avg=1290.89, stdev=1702.42 clat (nsec): min=1359, max=34740k, avg=8525517.65, stdev=2410440.28 lat (usec): min=454, max=172848, avg=9816.46, stdev=2718.20 clat percentiles (usec): | 1.00th=[ 3916], 5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6521], | 30.00th=[ 7111], 40.00th=[ 7635], 50.00th=[ 8225], 60.00th=[ 8848], | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11731], 95.00th=[12911], | 99.00th=[15270], 99.50th=[16188], 99.90th=[18482], 99.95th=[19530], | 99.99th=[22152] bw ( KiB/s): min=31744, max=282624, per=1.56%, avg=51770.75, stdev=3922.53, samples=22985 iops : min= 7936, max=70656, avg=12942.66, stdev=980.63, samples=22985 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.03% lat (usec) : 100=0.02%, 250=0.04%, 500=0.02%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.06%, 4=0.91%, 10=74.79%, 20=24.07%, 50=0.04% cpu : usr=1.00%, sys=4.98%, ctx=82332948, majf=0, minf=3752201 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% issued rwts: total=149203144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=3238MiB/s (3395MB/s), 3238MiB/s-3238MiB/s (3395MB/s-3395MB/s), io=569GiB (611GB), run=180019-180019msec Disk stats (read/write): rnbd0: ios=4661066/0, merge=0/0, ticks=10264711/0, in_queue=322110, util=99.62% rnbd1: ios=4661063/0, merge=0/0, ticks=10550895/0, in_queue=343350, util=99.64% rnbd2: ios=4661075/0, merge=0/0, ticks=10724899/0, in_queue=347820, util=99.65% rnbd3: ios=4661082/0, merge=0/0, ticks=10733532/0, in_queue=349620, util=99.67% rnbd4: ios=4661077/0, merge=0/0, ticks=10863981/0, in_queue=357250, util=99.67% rnbd5: ios=4661072/0, merge=0/0, ticks=10989829/0, in_queue=368510, util=99.68% rnbd6: ios=4661085/0, merge=0/0, ticks=11013824/0, in_queue=365960, util=99.68% rnbd7: ios=4661082/0, merge=0/0, ticks=11069249/0, in_queue=372950, util=99.70% rnbd8: ios=4661074/0, merge=0/0, ticks=9409880/0, in_queue=262130, util=99.72% rnbd9: ios=4661075/0, merge=0/0, ticks=10481517/0, in_queue=338960, util=99.73% rnbd10: ios=4661071/0, merge=0/0, ticks=10676791/0, in_queue=352930, util=99.74% rnbd11: ios=4661080/0, merge=0/0, ticks=10703680/0, in_queue=352040, util=99.76% rnbd12: ios=4661073/0, merge=0/0, ticks=10694124/0, in_queue=354740, util=99.76% rnbd13: ios=4661074/0, merge=0/0, ticks=10705697/0, in_queue=353340, util=99.78% rnbd14: ios=4661082/0, merge=0/0, ticks=10748646/0, in_queue=361020, util=99.78% rnbd15: ios=4661089/0, merge=0/0, ticks=10754424/0, in_queue=362580, util=99.79% rnbd16: ios=4661085/0, merge=0/0, ticks=10781956/0, in_queue=362570, util=99.82% rnbd17: ios=4661083/0, merge=0/0, ticks=10842119/0, in_queue=369510, util=99.83% rnbd18: ios=4661084/0, merge=0/0, ticks=10835750/0, in_queue=370490, util=99.85% rnbd19: ios=4661093/0, merge=0/0, ticks=10903655/0, in_queue=373100, util=99.86% rnbd20: ios=4661092/0, merge=0/0, ticks=10917943/0, in_queue=376360, util=99.87% rnbd21: ios=4661094/0, merge=0/0, ticks=10951428/0, in_queue=380590, util=99.89% rnbd22: ios=4661088/0, merge=0/0, ticks=10969831/0, in_queue=379130, util=99.89% rnbd23: ios=4661087/0, merge=0/0, ticks=11036709/0, in_queue=387860, util=99.91% rnbd24: ios=4661095/0, merge=0/0, ticks=11045243/0, in_queue=389770, util=99.91% rnbd25: ios=4661094/0, merge=0/0, ticks=11096795/0, in_queue=391280, util=99.93% rnbd26: ios=4661089/0, merge=0/0, ticks=11168428/0, in_queue=401420, util=99.95% rnbd27: ios=4661088/0, merge=0/0, ticks=11199394/0, in_queue=406780, util=99.96% rnbd28: ios=4661100/0, merge=0/0, ticks=11211816/0, in_queue=401820, util=99.98% rnbd29: ios=4661107/0, merge=0/0, ticks=11260826/0, in_queue=410480, util=99.99% rnbd30: ios=4661111/0, merge=0/0, ticks=11301041/0, in_queue=418590, util=99.99% rnbd31: ios=4661105/0, merge=0/0, ticks=11264838/0, in_queue=414460, util=100.00% root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev_128proc.sh fio start : Di 13. Apr 11:36:53 UTC 2021 kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux fio version : fio-3.12 gcc: gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Start fio test fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 ... fio-3.12 Starting 128 processes Jobs: 62 (f=0): [f(1),_(2),f(1),_(1),f(1),_(3),f(1),_(2),f(3),_(2),f(1),_(3),f(1),_(3),f(2),_(1),f(1),_(8),f(4),_(1),f(1),_(2),f(1),_(1),f(1),_(1),f(1),_(2),f(2),_(1),f(1),_(2),f(3),_(1),f(1),_(1),f(1),_(1), f(2),_(1),f(2),_(1),f(4),_(1),f(1),_(5),f(1),_(1),f(1),_(2),f(2),_(6),f(6),_(1),f(3),_(1),f(1),_(4),f(5),_(1),f(3),_(3),f(1),_(1),f(2)][100.0%][r=2504MiB/s][r=641k IOPS][eta 00m:00s] fiotest: (groupid=0, jobs=128): err= 0: pid=32673: Tue Apr 13 11:39:56 2021 read: IOPS=816k, BW=3187MiB/s (3341MB/s)(560GiB/180030msec) slat (usec): min=194, max=481083, avg=7768.18, stdev=5693.43 clat (nsec): min=1163, max=40685k, avg=12208741.78, stdev=3544380.51 lat (usec): min=509, max=483253, avg=19976.98, stdev=5877.39 clat percentiles (usec): | 1.00th=[ 4555], 5.00th=[ 6849], 10.00th=[ 7963], 20.00th=[ 9372], | 30.00th=[10290], 40.00th=[11207], 50.00th=[11994], 60.00th=[12911], | 70.00th=[13829], 80.00th=[15008], 90.00th=[16712], 95.00th=[18220], | 99.00th=[21627], 99.50th=[23200], 99.90th=[26346], 99.95th=[27657], | 99.99th=[30540] bw ( KiB/s): min= 2048, max=173732, per=0.78%, avg=25459.52, stdev=2917.03, samples=45970 iops : min= 512, max=43433, avg=6364.85, stdev=729.26, samples=45970 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.03% lat (usec) : 100=0.04%, 250=0.05%, 500=0.03%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.06%, 4=0.44%, 10=25.99%, 20=71.05%, 50=2.28% cpu : usr=0.58%, sys=2.18%, ctx=75776088, majf=0, minf=6628006 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% issued rwts: total=146863336,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=3187MiB/s (3341MB/s), 3187MiB/s-3187MiB/s (3341MB/s-3341MB/s), io=560GiB (602GB), run=180030-180030msec Disk stats (read/write): rnbd0: ios=4589419/0, merge=0/0, ticks=14454693/0, in_queue=2184000, util=99.67% rnbd1: ios=4589419/0, merge=0/0, ticks=14939602/0, in_queue=2323370, util=99.68% rnbd2: ios=4589422/0, merge=0/0, ticks=15071745/0, in_queue=2363720, util=99.69% rnbd3: ios=4589422/0, merge=0/0, ticks=15196019/0, in_queue=2395360, util=99.70% rnbd4: ios=4589422/0, merge=0/0, ticks=15229235/0, in_queue=2401140, util=99.71% rnbd5: ios=4589423/0, merge=0/0, ticks=15265557/0, in_queue=2395570, util=99.72% rnbd6: ios=4589425/0, merge=0/0, ticks=15305021/0, in_queue=2405310, util=99.72% rnbd7: ios=4589424/0, merge=0/0, ticks=15361167/0, in_queue=2417510, util=99.73% rnbd8: ios=4589425/0, merge=0/0, ticks=15374308/0, in_queue=2419060, util=99.76% rnbd9: ios=4589427/0, merge=0/0, ticks=11516180/0, in_queue=1438280, util=99.77% rnbd10: ios=4589427/0, merge=0/0, ticks=14170395/0, in_queue=2147290, util=99.77% rnbd11: ios=4589427/0, merge=0/0, ticks=14720156/0, in_queue=2296450, util=99.79% rnbd12: ios=4589426/0, merge=0/0, ticks=14798828/0, in_queue=2319630, util=99.79% rnbd13: ios=4589426/0, merge=0/0, ticks=14779692/0, in_queue=2314110, util=99.81% rnbd14: ios=4589426/0, merge=0/0, ticks=14807971/0, in_queue=2320940, util=99.81% rnbd15: ios=4589427/0, merge=0/0, ticks=14815817/0, in_queue=2320800, util=99.82% rnbd16: ios=4589428/0, merge=0/0, ticks=14874511/0, in_queue=2341850, util=99.85% rnbd17: ios=4589428/0, merge=0/0, ticks=14884319/0, in_queue=2340410, util=99.86% rnbd18: ios=4589429/0, merge=0/0, ticks=14889740/0, in_queue=2350840, util=99.88% rnbd19: ios=4589428/0, merge=0/0, ticks=14851478/0, in_queue=2333840, util=99.89% rnbd20: ios=4589430/0, merge=0/0, ticks=14917096/0, in_queue=2350030, util=99.89% rnbd21: ios=4589430/0, merge=0/0, ticks=14896627/0, in_queue=2342230, util=99.91% rnbd22: ios=4589431/0, merge=0/0, ticks=14865768/0, in_queue=2323510, util=99.91% rnbd23: ios=4589431/0, merge=0/0, ticks=14943766/0, in_queue=2351210, util=99.92% rnbd24: ios=4589431/0, merge=0/0, ticks=14952482/0, in_queue=2368310, util=99.92% rnbd25: ios=4589431/0, merge=0/0, ticks=14966379/0, in_queue=2367030, util=99.93% rnbd26: ios=4589433/0, merge=0/0, ticks=14975019/0, in_queue=2368030, util=99.95% rnbd27: ios=4589434/0, merge=0/0, ticks=14990885/0, in_queue=2369840, util=99.96% rnbd28: ios=4589433/0, merge=0/0, ticks=14936498/0, in_queue=2366290, util=99.98% rnbd29: ios=4589433/0, merge=0/0, ticks=14986887/0, in_queue=2380530, util=99.99% rnbd30: ios=4589434/0, merge=1/0, ticks=15018143/0, in_queue=2395330, util=99.99% rnbd31: ios=4589435/0, merge=0/0, ticks=14995177/0, in_queue=2381090, util=100.00% root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# git show HEAD | head commit 2636311e5e2894bd7c7800939a3b9b68e7a93bcc Author: Gioh Kim <gi-oh.kim@ionos.com> Date: Tue Apr 13 14:00:27 2021 +0200 swap likely and unlikely diff --git a/rtrs/rtrs-clt.c b/rtrs/rtrs-clt.c index 1b4b3e6..6235827 100644 --- a/rtrs/rtrs-clt.c +++ b/rtrs/rtrs-clt.c 141 root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# make clean && make make[1]: Entering directory '/usr/src/linux-5.4.86-pserver' CLEAN /tmp/ddd/gkim/ibnbd2/Module.symvers make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver' make[1]: Entering directory '/usr/src/linux-5.4.86-pserver' CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-clt-sysfs.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-common.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-dev.o CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-srv-sysfs.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-stats.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-clt-sysfs.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-stats.o CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-srv-sysfs.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.o AR /tmp/ddd/gkim/ibnbd2/built-in.a Building modules, stage 2. MODPOST 5 modules CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-client.ko CC [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rnbd/rnbd-server.ko CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-client.ko CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-core.ko CC [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.mod.o LD [M] /tmp/ddd/gkim/ibnbd2/rtrs/rtrs-server.ko make[1]: Leaving directory '/usr/src/linux-5.4.86-pserver' root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# rmmod rnbd-client root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# rmmod rtrs-client root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# rmmod rtrs-core root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# insmod rtrs/rtrs-core.ko root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# insmod rtrs/rtrs-client.ko root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# insmod rnbd/rnbd-client.ko fio start : Di 13. Apr 12:10:30 UTC 2021 kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux fio version : fio-3.12 gcc: gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Start fio test fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 ... fio-3.12 Starting 64 processes Jobs: 64 (f=1814): [r(4),f(2),r(5),f(1),r(11),f(1),r(2),f(1),r(4),f(2),r(5),f(1),r(4),f(1),r(12),f(1),r(5),f(1),r(1)][100.0%][r=3244MiB/s][r=830k IOPS][eta 00m:00s] fiotest: (groupid=0, jobs=64): err= 0: pid=37528: Tue Apr 13 12:13:32 2021 read: IOPS=829k, BW=3238MiB/s (3395MB/s)(569GiB/180025msec) slat (usec): min=165, max=195365, avg=1271.82, stdev=1671.22 clat (nsec): min=1080, max=32693k, avg=8544062.13, stdev=2411682.41 lat (usec): min=407, max=206880, avg=9815.94, stdev=2694.31 clat percentiles (usec): | 1.00th=[ 3949], 5.00th=[ 5211], 10.00th=[ 5800], 20.00th=[ 6587], | 30.00th=[ 7177], 40.00th=[ 7701], 50.00th=[ 8225], 60.00th=[ 8848], | 70.00th=[ 9503], 80.00th=[10421], 90.00th=[11731], 95.00th=[12911], | 99.00th=[15270], 99.50th=[16319], 99.90th=[18482], 99.95th=[19530], | 99.99th=[22152] bw ( KiB/s): min=29696, max=254980, per=1.56%, avg=51775.19, stdev=3418.36, samples=22980 iops : min= 7424, max=63745, avg=12943.76, stdev=854.59, samples=22980 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.04% lat (usec) : 100=0.04%, 250=0.02%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.05%, 4=0.90%, 10=74.58%, 20=24.31%, 50=0.04% cpu : usr=1.00%, sys=4.97%, ctx=82399686, majf=0, minf=3717209 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% issued rwts: total=149229295,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=3238MiB/s (3395MB/s), 3238MiB/s-3238MiB/s (3395MB/s-3395MB/s), io=569GiB (611GB), run=180025-180025msec Disk stats (read/write): rnbd0: ios=4661956/0, merge=0/0, ticks=10212115/0, in_queue=323800, util=99.67% rnbd1: ios=4661956/0, merge=0/0, ticks=10489817/0, in_queue=340940, util=99.70% rnbd2: ios=4661959/0, merge=0/0, ticks=10510042/0, in_queue=342180, util=99.70% rnbd3: ios=4661958/0, merge=0/0, ticks=10702745/0, in_queue=353350, util=99.73% rnbd4: ios=4661964/0, merge=0/0, ticks=10793914/0, in_queue=359190, util=99.74% rnbd5: ios=4661968/0, merge=0/0, ticks=10913714/0, in_queue=369150, util=99.74% rnbd6: ios=4661960/0, merge=0/0, ticks=10958094/0, in_queue=370730, util=99.74% rnbd7: ios=4661968/0, merge=0/0, ticks=10976320/0, in_queue=370090, util=99.76% rnbd8: ios=4661964/0, merge=0/0, ticks=11014804/0, in_queue=375780, util=99.79% rnbd9: ios=4661964/0, merge=0/0, ticks=11031969/0, in_queue=376760, util=99.80% rnbd10: ios=4661966/0, merge=0/0, ticks=11047729/0, in_queue=375450, util=99.81% rnbd11: ios=4661975/0, merge=0/0, ticks=11053595/0, in_queue=378140, util=99.83% rnbd12: ios=4661972/0, merge=1/0, ticks=11087759/0, in_queue=376570, util=99.83% rnbd13: ios=4661975/0, merge=0/0, ticks=11066221/0, in_queue=381940, util=99.85% rnbd14: ios=4661967/0, merge=0/0, ticks=11092973/0, in_queue=381730, util=99.85% rnbd15: ios=4661981/0, merge=0/0, ticks=11056803/0, in_queue=382830, util=99.86% rnbd16: ios=4661985/0, merge=0/0, ticks=9447901/0, in_queue=280700, util=99.89% rnbd17: ios=4661978/0, merge=0/0, ticks=10506961/0, in_queue=348500, util=99.90% rnbd18: ios=4661983/0, merge=0/0, ticks=10702411/0, in_queue=364060, util=99.92% rnbd19: ios=4661977/0, merge=0/0, ticks=10777160/0, in_queue=374250, util=99.92% rnbd20: ios=4661982/0, merge=0/0, ticks=10780637/0, in_queue=371820, util=99.93% rnbd21: ios=4661980/0, merge=0/0, ticks=10841533/0, in_queue=376750, util=99.95% rnbd22: ios=4661984/0, merge=0/0, ticks=10869817/0, in_queue=378430, util=99.95% rnbd23: ios=4661985/0, merge=0/0, ticks=10966341/0, in_queue=387410, util=99.96% rnbd24: ios=4661987/0, merge=0/0, ticks=10957613/0, in_queue=390960, util=99.96% rnbd25: ios=4661988/0, merge=0/0, ticks=11015585/0, in_queue=390920, util=99.97% rnbd26: ios=4661980/0, merge=0/0, ticks=11074411/0, in_queue=398090, util=100.00% rnbd27: ios=4661985/0, merge=0/0, ticks=11122911/0, in_queue=404760, util=100.00% rnbd28: ios=4661993/0, merge=0/0, ticks=11095077/0, in_queue=402480, util=100.00% rnbd29: ios=4661991/0, merge=0/0, ticks=11170485/0, in_queue=408370, util=100.00% rnbd30: ios=4661992/0, merge=0/0, ticks=11213819/0, in_queue=409730, util=100.00% rnbd31: ios=4661989/0, merge=0/0, ticks=11263063/0, in_queue=420640, util=100.00% root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev_128proc.sh fio start : Di 13. Apr 12:42:42 UTC 2021 kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux fio version : fio-3.12 gcc: gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Start fio test fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 ... fio-3.12 Starting 128 processes Jobs: 123 (f=2946): [r(3),f(1),r(8),f(1),r(4),f(2),r(1),f(2),r(1),f(1),r(1),f(1),r(5),f(1),r(1),f(1),r(3),_(2),r(2),f(2),r(4),f(2),r(5),f(2),r(6),f(3),r(1),f(2),r(1),_(1),r(7),f(1),r(2),f(1),r(5),f(1),r(5),f (5),r(1),f(1),r(1),_(2),f(1),r(2),f(1),r(22)][16.8%][r=3190MiB/s][r=817k IOPS][eta 15m:00s] fiotest: (groupid=0, jobs=128): err= 0: pid=39254: Tue Apr 13 12:45:45 2021 read: IOPS=817k, BW=3191MiB/s (3346MB/s)(561GiB/180029msec) slat (usec): min=69, max=412616, avg=7725.43, stdev=5595.18 clat (nsec): min=1054, max=39850k, avg=12233217.00, stdev=3542424.84 lat (usec): min=179, max=421190, avg=19958.71, stdev=5791.92 clat percentiles (usec): | 1.00th=[ 4555], 5.00th=[ 6849], 10.00th=[ 7963], 20.00th=[ 9372], | 30.00th=[10290], 40.00th=[11207], 50.00th=[11994], 60.00th=[12911], | 70.00th=[13829], 80.00th=[15008], 90.00th=[16712], 95.00th=[18220], | 99.00th=[21627], 99.50th=[22938], 99.90th=[26084], 99.95th=[27395], | 99.99th=[30540] bw ( KiB/s): min= 3072, max=146432, per=0.78%, avg=25495.37, stdev=2758.09, samples=45992 iops : min= 768, max=36608, avg=6373.81, stdev=689.53, samples=45992 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.02% lat (usec) : 100=0.02%, 250=0.03%, 500=0.04%, 750=0.03%, 1000=0.03% lat (msec) : 2=0.09%, 4=0.43%, 10=25.69%, 20=71.33%, 50=2.29% cpu : usr=0.57%, sys=2.20%, ctx=75748318, majf=0, minf=6305149 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.1%, >=64=100.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.1%, >=64=100.0% issued rwts: total=147045041,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=3191MiB/s (3346MB/s), 3191MiB/s-3191MiB/s (3346MB/s-3346MB/s), io=561GiB (602GB), run=180029-180029msec Disk stats (read/write): rnbd0: ios=4589511/0, merge=0/0, ticks=13423743/0, in_queue=1931790, util=99.58% rnbd1: ios=4589508/0, merge=0/0, ticks=14476127/0, in_queue=2228970, util=99.60% rnbd2: ios=4589517/0, merge=0/0, ticks=14624196/0, in_queue=2271880, util=99.61% rnbd3: ios=4589504/0, merge=0/0, ticks=14686013/0, in_queue=2286760, util=99.63% rnbd4: ios=4595158/0, merge=0/0, ticks=14772465/0, in_queue=2316590, util=99.62% rnbd5: ios=4595158/0, merge=0/0, ticks=14805361/0, in_queue=2314670, util=99.65% rnbd6: ios=4595158/0, merge=0/0, ticks=14817116/0, in_queue=2324300, util=99.65% rnbd7: ios=4595158/0, merge=0/0, ticks=14833164/0, in_queue=2320360, util=99.66% rnbd8: ios=4595158/0, merge=0/0, ticks=14900960/0, in_queue=2340200, util=99.68% rnbd9: ios=4595158/0, merge=0/0, ticks=14917077/0, in_queue=2345260, util=99.70% rnbd10: ios=4595158/0, merge=0/0, ticks=14931826/0, in_queue=2344540, util=99.71% rnbd11: ios=4595158/0, merge=0/0, ticks=14963132/0, in_queue=2345350, util=99.72% rnbd12: ios=4595158/0, merge=0/0, ticks=14978944/0, in_queue=2371930, util=99.73% rnbd13: ios=4595158/0, merge=0/0, ticks=14953823/0, in_queue=2349200, util=99.75% rnbd14: ios=4595157/0, merge=0/0, ticks=14991909/0, in_queue=2361030, util=99.75% rnbd15: ios=4595157/0, merge=0/0, ticks=15039741/0, in_queue=2379400, util=99.76% rnbd16: ios=4595157/0, merge=0/0, ticks=15057599/0, in_queue=2387550, util=99.79% rnbd17: ios=4595157/0, merge=0/0, ticks=15052981/0, in_queue=2378570, util=99.80% rnbd18: ios=4595157/0, merge=0/0, ticks=15364367/0, in_queue=2455030, util=99.83% rnbd19: ios=4595157/0, merge=0/0, ticks=15369998/0, in_queue=2462130, util=99.84% rnbd20: ios=4595157/0, merge=0/0, ticks=14953262/0, in_queue=2354080, util=99.84% rnbd21: ios=4595157/0, merge=0/0, ticks=15116061/0, in_queue=2404290, util=99.86% rnbd22: ios=4595157/0, merge=0/0, ticks=15190489/0, in_queue=2419870, util=99.86% rnbd23: ios=4595157/0, merge=0/0, ticks=15212165/0, in_queue=2414980, util=99.88% rnbd24: ios=4595157/0, merge=0/0, ticks=15225716/0, in_queue=2429100, util=99.88% rnbd25: ios=4595157/0, merge=0/0, ticks=15239578/0, in_queue=2428240, util=99.89% rnbd26: ios=4595157/0, merge=0/0, ticks=15251628/0, in_queue=2427670, util=99.92% rnbd27: ios=4595157/0, merge=0/0, ticks=13955168/0, in_queue=2127300, util=99.93% rnbd28: ios=4595157/0, merge=0/0, ticks=14694941/0, in_queue=2329440, util=99.95% rnbd29: ios=4595157/0, merge=0/0, ticks=14804318/0, in_queue=2355090, util=99.96% rnbd30: ios=4595157/0, merge=0/0, ticks=15183672/0, in_queue=2421440, util=99.96% rnbd31: ios=4595157/0, merge=0/0, ticks=11575822/0, in_queue=1492750, util=99.98% root@ps401a-914.nst:/tmp/ddd/gkim/ibnbd2# bash go_32dev_128proc.sh fio start : Di 13. Apr 12:51:40 UTC 2021 kernel info : Linux ps401a-914 5.4.86-pserver #5.4.86-3~deb10 SMP Fri Mar 5 12:29:36 UTC 2021 x86_64 GNU/Linux fio version : fio-3.12 gcc: gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Start fio test fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 ... fio-3.12 Starting 128 processes Jobs: 70 (f=470): [f(4),_(1),f(2),_(1),f(1),_(1),f(1),_(1),f(1),_(5),f(1),_(2),f(1),_(1),f(2),_(3),f(2),_(1),f(3),r(1),_(4),f(1),_(1),f(2),r(1),_(1),f(2),_(1),f(1),_(1),f(1),_(5),f(3),_(2),f(4),_(5),f(1),_(1 ),f(1),_(2),f(3),_(2),f(1),_(1),f(1),_(1),f(2),_(1),r(1),_(1),f(5),_(1),f(1),_(3),f(5),_(1),f(1),_(2),f(1),_(1),f(1),_(2),f(9),_(1),f(2),_(1),f(1),_(1)][1.7%][r=3210MiB/s][r=822k IOPS][eta 02h:54m:00s] fiotest: (groupid=0, jobs=128): err= 0: pid=40166: Tue Apr 13 12:54:43 2021 read: IOPS=817k, BW=3193MiB/s (3348MB/s)(561GiB/180023msec) slat (usec): min=7, max=292298, avg=7787.34, stdev=5758.03 clat (nsec): min=1586, max=50911k, avg=12167070.96, stdev=3539907.52 lat (usec): min=206, max=300176, avg=19954.48, stdev=5924.10 clat percentiles (usec): | 1.00th=[ 4490], 5.00th=[ 6783], 10.00th=[ 7898], 20.00th=[ 9241], | 30.00th=[10290], 40.00th=[11076], 50.00th=[11994], 60.00th=[12780], | 70.00th=[13829], 80.00th=[15008], 90.00th=[16712], 95.00th=[18220], | 99.00th=[21627], 99.50th=[22938], 99.90th=[26084], 99.95th=[27395], | 99.99th=[30540] bw ( KiB/s): min=10240, max=156672, per=0.78%, avg=25505.77, stdev=3163.58, samples=45970 iops : min= 2560, max=39168, avg=6376.42, stdev=790.91, samples=45970 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.02% lat (usec) : 100=0.02%, 250=0.02%, 500=0.02%, 750=0.02%, 1000=0.02% lat (msec) : 2=0.11%, 4=0.49%, 10=26.37%, 20=70.68%, 50=2.22% lat (msec) : 100=0.01% cpu : usr=0.55%, sys=2.16%, ctx=75466111, majf=0, minf=5747705 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=0.1%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% complete : 0=0.0%, 4=0.1%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0% issued rwts: total=147144833,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=3193MiB/s (3348MB/s), 3193MiB/s-3193MiB/s (3348MB/s-3348MB/s), io=561GiB (603GB), run=180023-180023msec Disk stats (read/write): rnbd0: ios=4598269/0, merge=0/0, ticks=14961482/0, in_queue=2294850, util=99.60%
On Tue, Apr 13, 2021 at 03:11:33PM +0200, Gioh Kim wrote: > On Tue, Apr 13, 2021 at 8:43 AM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Tue, Apr 13, 2021 at 05:31:24AM +0000, Haakon Bugge wrote: > > > > > > > > > > On 12 Apr 2021, at 19:34, Leon Romanovsky <leon@kernel.org> wrote: > > > > > > > > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote: > > > >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang <jinpu.wang@ionos.com> wrote: > > > >>> > > > >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > >>>> > > > >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > > > >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky <leon@kernel.org> wrote: > > > >>>>>> > > > >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > > >>>>>>> From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > > >>>>>>> > > > >>>>>>> Client prints only error value and it is not enough for debugging. > > > >>>>>>> > > > >>>>>>> 1. When client receives an error from server: > > > >>>>>>> the client does not only print the error value but also > > > >>>>>>> more information of server connection. > > > >>>>>>> > > > >>>>>>> 2. When client failes to send IO: > > > >>>>>>> the client gets an error from RDMA layer. It also > > > >>>>>>> print more information of server connection. > > > >>>>>>> > > > >>>>>>> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > > > >>>>>>> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > > > >>>>>>> --- > > > >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > > >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-) > > > >>>>>>> > > > >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > >>>>>>> index 5062328ac577..a534b2b09e13 100644 > > > >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > > >>>>>>> req->in_use = false; > > > >>>>>>> req->con = NULL; > > > >>>>>>> > > > >>>>>>> + if (unlikely(errno)) { > > > >>>>>> > > > >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo > > > >>>>>> cult. Can you please provide supportive performance data or delete all > > > >>>>>> likely/unlikely in all rtrs code? > > > >>>>> > > > >>>>> Hi Leon, > > > >>>>> > > > >>>>> All the likely/unlikely from the non-fast path was removed as you > > > >>>>> suggested in the past. > > > >>>>> This one is on IO path, my understanding is for the fast path, with > > > >>>>> likely/unlikely macro, > > > >>>>> the compiler will optimize the code for better branch prediction. > > > >>>> > > > >>>> In theory yes, in practice. gcc 10 generated same assembly code when I > > > >>>> placed likely() and replaced it with unlikely() later. > > > >> > > > >> Even-thought gcc 10 generated the same assembly code, > > > >> there is no guarantee for gcc 11 or gcc 12. > > > >> > > > >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely. > > > >> But I think likely/unlikely are necessary for extreme cases. > > > >> I will have a discussion with my colleagues and inform you of the result. > > > > > > > > Please come with performance data. > > > > > > I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any. > > > > In theory, it will multiply by 2 gain/loss, which is nice to see if > > likely/ulikely change something. > > > > Thanks > > > > > > > > > > > Thxs, Håkon > > > > > Hi, > > In summary, there is no performance gap before/after swapping > likely/unlikely macros. > So I will send a patch to remove all likely/unlikely macros. > > I guess that is because > - The performance of rnbd/rtrs depends on the network and block layer. > - The network and block layer are not fast enough to get impacted by > likely/unlikely. Thanks for sharing this data. Your input can't truly randomize the code path execution flows and your instructions cache was filled "correctly". It was expected. In most cases, the likely/unlikely is not needed. Thanks
On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > From: Gioh Kim <gi-oh.kim@cloud.ionos.com> > > Client prints only error value and it is not enough for debugging. > > 1. When client receives an error from server: > the client does not only print the error value but also > more information of server connection. > > 2. When client failes to send IO: > the client gets an error from RDMA layer. It also > print more information of server connection. > > Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> > Signed-off-by: Jack Wang <jinpu.wang@ionos.com> > drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > index 5062328ac577..a534b2b09e13 100644 > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > req->in_use = false; > req->con = NULL; > > + if (unlikely(errno)) { > + rtrs_err_rl(con->c.sess, "IO request failed: error=%d path=%s [%s:%u]\n", > + errno, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port); Mind the long lines, I fixed them and removed the unlikely Jason
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 5062328ac577..a534b2b09e13 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, req->in_use = false; req->con = NULL; + if (unlikely(errno)) { + rtrs_err_rl(con->c.sess, "IO request failed: error=%d path=%s [%s:%u]\n", + errno, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port); + } + if (notify) req->conf(req->priv, errno); } @@ -1020,7 +1025,8 @@ static int rtrs_clt_write_req(struct rtrs_clt_io_req *req) req->usr_len + sizeof(*msg), imm); if (unlikely(ret)) { - rtrs_err(s, "Write request failed: %d\n", ret); + rtrs_err_rl(s, "Write request failed: error=%d path=%s [%s:%u]\n", + ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port); if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT) atomic_dec(&sess->stats->inflight); if (req->sg_cnt) @@ -1138,7 +1144,8 @@ static int rtrs_clt_read_req(struct rtrs_clt_io_req *req) ret = rtrs_post_send_rdma(req->con, req, &sess->rbufs[buf_id], req->data_len, imm, wr); if (unlikely(ret)) { - rtrs_err(s, "Read request failed: %d\n", ret); + rtrs_err_rl(s, "Read request failed: error=%d path=%s [%s:%u]\n", + ret, kobject_name(&sess->kobj), sess->hca_name, sess->hca_port); if (sess->clt->mp_policy == MP_POLICY_MIN_INFLIGHT) atomic_dec(&sess->stats->inflight); req->need_inv = false; @@ -2459,12 +2466,30 @@ static int init_sess(struct rtrs_clt_sess *sess) mutex_lock(&sess->init_mutex); err = init_conns(sess); if (err) { - rtrs_err(sess->clt, "init_conns(), err: %d\n", err); + char str[NAME_MAX]; + int err; + struct rtrs_addr path = { + .src = &sess->s.src_addr, + .dst = &sess->s.dst_addr, + }; + + rtrs_addr_to_str(&path, str, sizeof(str)); + rtrs_err(sess->clt, "init_conns() failed: err=%d path=%s [%s:%u]\n", + err, str, sess->hca_name, sess->hca_port); goto out; } err = rtrs_send_sess_info(sess); if (err) { - rtrs_err(sess->clt, "rtrs_send_sess_info(), err: %d\n", err); + char str[NAME_MAX]; + int err; + struct rtrs_addr path = { + .src = &sess->s.src_addr, + .dst = &sess->s.dst_addr, + }; + + rtrs_addr_to_str(&path, str, sizeof(str)); + rtrs_err(sess->clt, "rtrs_send_sess_info() failed: err=%d path=%s [%s:%u]\n", + err, str, sess->hca_name, sess->hca_port); goto out; } rtrs_clt_sess_up(sess);