diff mbox

[ceph-users] Help needed porting Ceph to RSockets

Message ID 1828884A29C6694DAF28B7E6B8A8237388CA6C27@ORSMSX109.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hefty, Sean Aug. 19, 2013, 5:10 p.m. UTC
Can you see if the patch below fixes the hang?

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---
 src/rsocket.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andreas Bluemle Aug. 20, 2013, 7:21 a.m. UTC | #1
Hi Sean,

I will re-check until the end of the week; there is
some test scheduling issue with our test system, which
affects my access times.

Thanks

Andreas


On Mon, 19 Aug 2013 17:10:11 +0000
"Hefty, Sean" <sean.hefty@intel.com> wrote:

> Can you see if the patch below fixes the hang?
> 
> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
> ---
>  src/rsocket.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/src/rsocket.c b/src/rsocket.c
> index d544dd0..e45b26d 100644
> --- a/src/rsocket.c
> +++ b/src/rsocket.c
> @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
> *rfds, struct pollfd *fds, nfds_t nfds) 
>  		rs = idm_lookup(&idm, fds[i].fd);
>  		if (rs) {
> +			fastlock_acquire(&rs->cq_wait_lock);
>  			if (rs->type == SOCK_STREAM)
>  				rs_get_cq_event(rs);
>  			else
>  				ds_get_cq_event(rs);
> +			fastlock_release(&rs->cq_wait_lock);
>  			fds[i].revents = rs_poll_rs(rs,
> fds[i].events, 1, rs_poll_all); } else {
>  			fds[i].revents = rfds[i].revents;
> @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
> *writefds, 
>  /*
>   * For graceful disconnect, notify the remote side that we're
> - * disconnecting and wait until all outstanding sends complete.
> + * disconnecting and wait until all outstanding sends complete,
> provided
> + * that the remote side has not sent a disconnect message.
>   */
>  int rshutdown(int socket, int how)
>  {
> @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
>  	if (rs->state & rs_connected)
>  		rs_process_cq(rs, 0, rs_conn_all_sends_done);
>  
> +	if (rs->state & rs_disconnected) {
> +		/* Generate event by flushing receives to unblock
> rpoll */
> +		ibv_req_notify_cq(rs->cm_id->recv_cq, 0);
> +		rdma_disconnect(rs->cm_id);
> +	}
> +
>  	if ((rs->fd_flags & O_NONBLOCK) && (rs->state &
> rs_connected)) rs_set_nonblocking(rs, rs->fd_flags);
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>
Andreas Bluemle Aug. 20, 2013, 10:30 a.m. UTC | #2
Hi,

I have added the patch and re-tested: I still encounter
hangs of my application. I am not quite sure whether the
I hit the same error on the shutdown because now I don't hit
the error always, but only every now and then.

WHen adding the patch to my code base (git tag v1.0.17) I notice
an offset of "-34 lines". Which code base are you using?


Best Regards

Andreas Bluemle

On Tue, 20 Aug 2013 09:21:13 +0200
Andreas Bluemle <andreas.bluemle@itxperts.de> wrote:

> Hi Sean,
> 
> I will re-check until the end of the week; there is
> some test scheduling issue with our test system, which
> affects my access times.
> 
> Thanks
> 
> Andreas
> 
> 
> On Mon, 19 Aug 2013 17:10:11 +0000
> "Hefty, Sean" <sean.hefty@intel.com> wrote:
> 
> > Can you see if the patch below fixes the hang?
> > 
> > Signed-off-by: Sean Hefty <sean.hefty@intel.com>
> > ---
> >  src/rsocket.c |   11 ++++++++++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> > 
> > diff --git a/src/rsocket.c b/src/rsocket.c
> > index d544dd0..e45b26d 100644
> > --- a/src/rsocket.c
> > +++ b/src/rsocket.c
> > @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
> > *rfds, struct pollfd *fds, nfds_t nfds) 
> >  		rs = idm_lookup(&idm, fds[i].fd);
> >  		if (rs) {
> > +			fastlock_acquire(&rs->cq_wait_lock);
> >  			if (rs->type == SOCK_STREAM)
> >  				rs_get_cq_event(rs);
> >  			else
> >  				ds_get_cq_event(rs);
> > +			fastlock_release(&rs->cq_wait_lock);
> >  			fds[i].revents = rs_poll_rs(rs,
> > fds[i].events, 1, rs_poll_all); } else {
> >  			fds[i].revents = rfds[i].revents;
> > @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
> > *writefds, 
> >  /*
> >   * For graceful disconnect, notify the remote side that we're
> > - * disconnecting and wait until all outstanding sends complete.
> > + * disconnecting and wait until all outstanding sends complete,
> > provided
> > + * that the remote side has not sent a disconnect message.
> >   */
> >  int rshutdown(int socket, int how)
> >  {
> > @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
> >  	if (rs->state & rs_connected)
> >  		rs_process_cq(rs, 0, rs_conn_all_sends_done);
> >  
> > +	if (rs->state & rs_disconnected) {
> > +		/* Generate event by flushing receives to unblock
> > rpoll */
> > +		ibv_req_notify_cq(rs->cm_id->recv_cq, 0);
> > +		rdma_disconnect(rs->cm_id);
> > +	}
> > +
> >  	if ((rs->fd_flags & O_NONBLOCK) && (rs->state &
> > rs_connected)) rs_set_nonblocking(rs, rs->fd_flags);
> >  
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-rdma" in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 
>
Hefty, Sean Aug. 20, 2013, 3:04 p.m. UTC | #3
> I have added the patch and re-tested: I still encounter
> hangs of my application. I am not quite sure whether the
> I hit the same error on the shutdown because now I don't hit
> the error always, but only every now and then.

I guess this is at least some progress... :/
 
> WHen adding the patch to my code base (git tag v1.0.17) I notice
> an offset of "-34 lines". Which code base are you using?

This patch was generated against the tip of the git tree. 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Anderson Aug. 21, 2013, 11:44 a.m. UTC | #4
Hi Sean,

I tested out the patch and unfortunately had the same results as
Andreas. About 50% of the time the rpoll() thread in Ceph still hangs
when rshutdown() is called. I saw a similar behaviour when increasing
the poll time on the pre-patched version if that's of any relevance.

Thanks

On Tue, Aug 20, 2013 at 11:04 PM, Hefty, Sean <sean.hefty@intel.com> wrote:
>> I have added the patch and re-tested: I still encounter
>> hangs of my application. I am not quite sure whether the
>> I hit the same error on the shutdown because now I don't hit
>> the error always, but only every now and then.
>
> I guess this is at least some progress... :/
>
>> WHen adding the patch to my code base (git tag v1.0.17) I notice
>> an offset of "-34 lines". Which code base are you using?
>
> This patch was generated against the tip of the git tree.
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/rsocket.c b/src/rsocket.c
index d544dd0..e45b26d 100644
--- a/src/rsocket.c
+++ b/src/rsocket.c
@@ -2948,10 +2948,12 @@  static int rs_poll_events(struct pollfd *rfds, struct pollfd *fds, nfds_t nfds)
 
 		rs = idm_lookup(&idm, fds[i].fd);
 		if (rs) {
+			fastlock_acquire(&rs->cq_wait_lock);
 			if (rs->type == SOCK_STREAM)
 				rs_get_cq_event(rs);
 			else
 				ds_get_cq_event(rs);
+			fastlock_release(&rs->cq_wait_lock);
 			fds[i].revents = rs_poll_rs(rs, fds[i].events, 1, rs_poll_all);
 		} else {
 			fds[i].revents = rfds[i].revents;
@@ -3098,7 +3100,8 @@  int rselect(int nfds, fd_set *readfds, fd_set *writefds,
 
 /*
  * For graceful disconnect, notify the remote side that we're
- * disconnecting and wait until all outstanding sends complete.
+ * disconnecting and wait until all outstanding sends complete, provided
+ * that the remote side has not sent a disconnect message.
  */
 int rshutdown(int socket, int how)
 {
@@ -3138,6 +3141,12 @@  int rshutdown(int socket, int how)
 	if (rs->state & rs_connected)
 		rs_process_cq(rs, 0, rs_conn_all_sends_done);
 
+	if (rs->state & rs_disconnected) {
+		/* Generate event by flushing receives to unblock rpoll */
+		ibv_req_notify_cq(rs->cm_id->recv_cq, 0);
+		rdma_disconnect(rs->cm_id);
+	}
+
 	if ((rs->fd_flags & O_NONBLOCK) && (rs->state & rs_connected))
 		rs_set_nonblocking(rs, rs->fd_flags);