Message ID | 1430340983-12538-1-git-send-email-david.ahern@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Wed, Apr 29, 2015 at 04:56:23PM -0400, David Ahern wrote: > static void cm_mask_copy(u8 *dst, u8 *src, u8 *mask) > { Do you know what call site is unaligned? A quick audit suggests that fixing a few possibly troublesome callers to be guaranteed 32 bit aligned should be fairly straight foward. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/29/15 3:18 PM, Jason Gunthorpe wrote: > On Wed, Apr 29, 2015 at 04:56:23PM -0400, David Ahern wrote: > >> static void cm_mask_copy(u8 *dst, u8 *src, u8 *mask) >> { > > Do you know what call site is unaligned? A quick audit suggests that > fixing a few possibly troublesome callers to be guaranteed 32 bit > aligned should be fairly straight foward. I believe this case is cm_find_listen() -> cm_compare_private_data() -> cm_mask_copy() -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 03:24:19PM -0600, David Ahern wrote: > On 4/29/15 3:18 PM, Jason Gunthorpe wrote: > >On Wed, Apr 29, 2015 at 04:56:23PM -0400, David Ahern wrote: > > > >> static void cm_mask_copy(u8 *dst, u8 *src, u8 *mask) > >> { > > > >Do you know what call site is unaligned? A quick audit suggests that > >fixing a few possibly troublesome callers to be guaranteed 32 bit > >aligned should be fairly straight foward. > > > I believe this case is > > cm_find_listen() -> cm_compare_private_data() -> cm_mask_copy() Right, I'm pretty sure that is the main place that would have the 32 bit alignment limitation since it is a on the wire message. I suggest changing the signature to: cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask) And dealing with the fairly few resulting changes.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/29/15 3:30 PM, Jason Gunthorpe wrote: > On Wed, Apr 29, 2015 at 03:24:19PM -0600, David Ahern wrote: >> On 4/29/15 3:18 PM, Jason Gunthorpe wrote: >>> On Wed, Apr 29, 2015 at 04:56:23PM -0400, David Ahern wrote: >>> >>>> static void cm_mask_copy(u8 *dst, u8 *src, u8 *mask) >>>> { >>> >>> Do you know what call site is unaligned? A quick audit suggests that >>> fixing a few possibly troublesome callers to be guaranteed 32 bit >>> aligned should be fairly straight foward. >> >> >> I believe this case is >> >> cm_find_listen() -> cm_compare_private_data() -> cm_mask_copy() > > Right, I'm pretty sure that is the main place that would have the 32 > bit alignment limitation since it is a on the wire message. > > I suggest changing the signature to: > > cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask) > > And dealing with the fairly few resulting changes.. Confused. That does not deal with the alignment problem. Internal to cm_mask_copy unsigned longs are used (8-bytes), so why change the signature to u32? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 03:38:22PM -0600, David Ahern wrote: > >And dealing with the fairly few resulting changes.. > > Confused. That does not deal with the alignment problem. Internal to > cm_mask_copy unsigned longs are used (8-bytes), so why change the > signature to u32? You'd change the loop stride to by u32 as well. This whole thing is just an attempted optimization, but doing copy and mask 8 bytes at a time on unaligned data is not very efficient, even on x86. So either drop the optimization and use u8 as the stride. Or keep the optimization and guarentee alignment, the best we can do is u32. Since this is an optimization, get_unaligned should be avoided, looping over u8 would be faster. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/29/15 3:51 PM, Jason Gunthorpe wrote: > On Wed, Apr 29, 2015 at 03:38:22PM -0600, David Ahern wrote: > >>> And dealing with the fairly few resulting changes.. >> >> Confused. That does not deal with the alignment problem. Internal to >> cm_mask_copy unsigned longs are used (8-bytes), so why change the >> signature to u32? > > You'd change the loop stride to by u32 as well. > > This whole thing is just an attempted optimization, but doing copy and > mask 8 bytes at a time on unaligned data is not very efficient, even > on x86. > > So either drop the optimization and use u8 as the stride. > > Or keep the optimization and guarentee alignment, the best we can do > is u32. > > Since this is an optimization, get_unaligned should be avoided, > looping over u8 would be faster. Sorry, to be dense on this but I still don't see how your proposal addresses the underlying problem -- guarantee of 8 byte alignment. Be it a u8 or u32 for the input arguments the cast to unsigned long says that an extended load operation can be used for access. For that to work the address (src, dst, mask) must all be 8-byte aligned. David -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 03:59:20PM -0600, David Ahern wrote: > On 4/29/15 3:51 PM, Jason Gunthorpe wrote: > >On Wed, Apr 29, 2015 at 03:38:22PM -0600, David Ahern wrote: > > > >>>And dealing with the fairly few resulting changes.. > >> > >>Confused. That does not deal with the alignment problem. Internal to > >>cm_mask_copy unsigned longs are used (8-bytes), so why change the > >>signature to u32? > > > >You'd change the loop stride to by u32 as well. > > > >This whole thing is just an attempted optimization, but doing copy and > >mask 8 bytes at a time on unaligned data is not very efficient, even > >on x86. > > > >So either drop the optimization and use u8 as the stride. > > > >Or keep the optimization and guarentee alignment, the best we can do > >is u32. > > > >Since this is an optimization, get_unaligned should be avoided, > >looping over u8 would be faster. > > Sorry, to be dense on this but I still don't see how your proposal > addresses the underlying problem -- guarantee of 8 byte alignment. > Be it a u8 or u32 for the input arguments the cast to unsigned long > says that an extended load operation can be used for access. For > that to work the address (src, dst, mask) must all be 8-byte > aligned. Read carefully: > >You'd change the loop stride to by u32 as well. which means this: static void cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask) { int i; for (i = 0; i < IB_CM_COMPARE_SIZE; i++) dst[i] = src[i] & mask[i]; } Divide IB_CM_COMPARE_SIZE by 4, adjust all users. Make other related type changes the compiler tells you about. Do not add casts. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/29/15 4:15 PM, Jason Gunthorpe wrote: > Read carefully: > >>> > >You'd change the loop stride to by u32 as well. > which means this: > > static void cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask) > { > int i; > > for (i = 0; i < IB_CM_COMPARE_SIZE; i++) > dst[i] = src[i] & mask[i]; > } > > Divide IB_CM_COMPARE_SIZE by 4, adjust all users. > > Make other related type changes the compiler tells you about. > > Do not add casts. got it. that was a lot clearer. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e28a494..5102cfe 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -439,11 +439,23 @@ static struct cm_id_private * cm_acquire_id(__be32 local_id, __be32 remote_id) static void cm_mask_copy(u8 *dst, u8 *src, u8 *mask) { + unsigned long *plsrc, *plmask, *pldst; int i; - for (i = 0; i < IB_CM_COMPARE_SIZE / sizeof(unsigned long); i++) - ((unsigned long *) dst)[i] = ((unsigned long *) src)[i] & - ((unsigned long *) mask)[i]; + /* unsigned longs can use extended load operations which + * can require 8-byte alignments. dst, src and mask are + * not guaranteed to be aligned. + */ + pldst = (unsigned long *) dst; + plsrc = (unsigned long *) src; + plmask = (unsigned long *) mask; + for (i = 0; i < IB_CM_COMPARE_SIZE / sizeof(unsigned long); i++) { + unsigned long lsrc, lmask; + + lsrc = get_unaligned(&plsrc[i]); + lmask = get_unaligned(&plmask[i]); + put_unaligned(lsrc & lmask, &pldst[i]); + } } static int cm_compare_data(struct ib_cm_compare_data *src_data,
Addresses the following kernel logs seen during boot of sparc systems: Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Signed-off-by: David Ahern <david.ahern@oracle.com> --- drivers/infiniband/core/cm.c | 18 +++++++++++++++--- 1 files changed, 15 insertions(+), 3 deletions(-)