diff mbox

af_packet: flush complete kernel cache in packet_sendmsg

Message ID 1314961686-30870-1-git-send-email-phil.sutter@viprinet.com (mailing list archive)
State New, archived
Headers show

Commit Message

Phil Sutter Sept. 2, 2011, 11:08 a.m. UTC
This flushes the cache before and after accessing the mmapped packet
buffer. It seems like the call to flush_dcache_page from inside
__packet_get_status is not enough on Kirkwood (or ARM in general).
---
I know this is far from an optimal solution, but it's in fact the only working
one I found. And it shouldn't interfere with unaffected target systems. So
anyone relying on a working TX_RING on Kirkwood may refer to this patch. Any
ARM/cache/Marvell/Kirkwood experts out there feel free to improve this.
---
 net/packet/af_packet.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

Comments

Ben Hutchings Sept. 2, 2011, 1:46 p.m. UTC | #1
On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> This flushes the cache before and after accessing the mmapped packet
> buffer. It seems like the call to flush_dcache_page from inside
> __packet_get_status is not enough on Kirkwood (or ARM in general).
> ---
> I know this is far from an optimal solution, but it's in fact the only working
> one I found.
[...]

This is ridiculous.  If flush_dcache_page() isn't doing everything it
should, you need to fix that.

Ben.
Phil Sutter Sept. 2, 2011, 1:59 p.m. UTC | #2
On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > This flushes the cache before and after accessing the mmapped packet
> > buffer. It seems like the call to flush_dcache_page from inside
> > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > ---
> > I know this is far from an optimal solution, but it's in fact the only working
> > one I found.
> [...]
> 
> This is ridiculous.  If flush_dcache_page() isn't doing everything it
> should, you need to fix that.

You're absolutely correct. But in fact this problem goes way too deep
for me to find it's cause. And since my time is finite, I doubt this
will change in the near future. So I asked for help, a pointer in
whatever direction or anything I could try to help further analyzing -
without any response (unless I missed it, in which case I apologize).

Please don't get me wrong. I have no intend in this patch becoming
mainline, just want to give others with the same problem a starting
point.

Greetings, Phil
chetan loke Sept. 2, 2011, 2 p.m. UTC | #3
>
> This flushes the cache before and after accessing the mmapped packet
> buffer. It seems like the call to flush_dcache_page from inside
> __packet_get_status is not enough on Kirkwood (or ARM in general).



> +       kw_extra_cache_flush();
> +       rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
> +                       packet_snd(sock, msg, len);
> +       kw_extra_cache_flush();
> +       return rc;
>  }

If a workaround is needed for mmap, then why not change tpacket_snd?

Also, is this workaround actually working for all the cases? Because
packet_get_status is not being touched in your patch.

Also, I don't see any changes for the Rx-path. Is that working ok?


Chetan Loke
Phil Sutter Sept. 2, 2011, 3:31 p.m. UTC | #4
On Fri, Sep 02, 2011 at 10:00:16AM -0400, chetan loke wrote:
> >
> > This flushes the cache before and after accessing the mmapped packet
> > buffer. It seems like the call to flush_dcache_page from inside
> > __packet_get_status is not enough on Kirkwood (or ARM in general).
> 
> 
> 
> > +       kw_extra_cache_flush();
> > +       rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
> > +                       packet_snd(sock, msg, len);
> > +       kw_extra_cache_flush();
> > +       return rc;
> >  }
> 
> If a workaround is needed for mmap, then why not change tpacket_snd?

I did not verify that packet_snd() is not affected. OTOH, adding it
there was quite "intuitive".

> Also, is this workaround actually working for all the cases? Because
> packet_get_status is not being touched in your patch.
> 
> Also, I don't see any changes for the Rx-path. Is that working ok?

So far we haven't noticed problems in that direction. I just tried some
explicit test: having tcpdump print local timestamps (not the pcap-ones)
on every received packet, activating icmp_echo_ignore_all and pinging
the host on a dedicated line. I expected to sometimes see a second
difference between the two timestamps, as like with sending from time to
time a packet should get "lost" in the cache, and then occur to
userspace after the next one arrived. Maybe my test is broken, or RX is
indeed unaffected.

Greetings and thanks for the hints, Phil
Russell King - ARM Linux Sept. 2, 2011, 5:28 p.m. UTC | #5
On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > This flushes the cache before and after accessing the mmapped packet
> > buffer. It seems like the call to flush_dcache_page from inside
> > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > ---
> > I know this is far from an optimal solution, but it's in fact the only working
> > one I found.
> [...]
> 
> This is ridiculous.  If flush_dcache_page() isn't doing everything it
> should, you need to fix that.

It does do everything it should - which is to perform maintanence on
page cache pages.  It flushes the kernel mapping of the page.  It
also flushes the userspace mappings of the page which it finds by
walking the mmap list via the associated struct page.  It does not
touch vmalloc mappings because it has no way to know whether they
exist or not.

It doesn't do so much for anonymous pages - to do so would only
duplicate what flush_anon_page() does at the very same callsites.
Plus the mmap list isn't available for such pages so there's no
way to find out what userspace addresses to flush.

If the AF_PACKET buffers are created from anonymous pages and it's
using flush_dcache_page(), it's using the wrong interface.
Phil Sutter Sept. 5, 2011, 7:57 p.m. UTC | #6
Hi,

On Fri, Sep 02, 2011 at 06:28:50PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > > This flushes the cache before and after accessing the mmapped packet
> > > buffer. It seems like the call to flush_dcache_page from inside
> > > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > > ---
> > > I know this is far from an optimal solution, but it's in fact the only working
> > > one I found.
> > [...]
> > 
> > This is ridiculous.  If flush_dcache_page() isn't doing everything it
> > should, you need to fix that.
> 
> It does do everything it should - which is to perform maintanence on
> page cache pages.  It flushes the kernel mapping of the page.  It
> also flushes the userspace mappings of the page which it finds by
> walking the mmap list via the associated struct page.  It does not
> touch vmalloc mappings because it has no way to know whether they
> exist or not.
> 
> It doesn't do so much for anonymous pages - to do so would only
> duplicate what flush_anon_page() does at the very same callsites.
> Plus the mmap list isn't available for such pages so there's no
> way to find out what userspace addresses to flush.

Indeed very interesting information, thanks a lot!

The code in question uses __get_free_pages(), and if that fails uses
vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths
show result in the same faulty behaviour.

> If the AF_PACKET buffers are created from anonymous pages and it's
> using flush_dcache_page(), it's using the wrong interface.

So, in order to fix this, which alternative would you suggest? Quite a
lot of work has been done regarding memory allocation, so I guess
changing that side is a no-go.

Greetings, Phil
Russell King - ARM Linux Sept. 6, 2011, 9:57 a.m. UTC | #7
On Mon, Sep 05, 2011 at 09:57:14PM +0200, Phil Sutter wrote:
> Hi,
> 
> On Fri, Sep 02, 2011 at 06:28:50PM +0100, Russell King - ARM Linux wrote:
> > On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> > > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > > > This flushes the cache before and after accessing the mmapped packet
> > > > buffer. It seems like the call to flush_dcache_page from inside
> > > > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > > > ---
> > > > I know this is far from an optimal solution, but it's in fact the only working
> > > > one I found.
> > > [...]
> > > 
> > > This is ridiculous.  If flush_dcache_page() isn't doing everything it
> > > should, you need to fix that.
> > 
> > It does do everything it should - which is to perform maintanence on
> > page cache pages.  It flushes the kernel mapping of the page.  It
> > also flushes the userspace mappings of the page which it finds by
> > walking the mmap list via the associated struct page.  It does not
> > touch vmalloc mappings because it has no way to know whether they
> > exist or not.
> > 
> > It doesn't do so much for anonymous pages - to do so would only
> > duplicate what flush_anon_page() does at the very same callsites.
> > Plus the mmap list isn't available for such pages so there's no
> > way to find out what userspace addresses to flush.
> 
> Indeed very interesting information, thanks a lot!
> 
> The code in question uses __get_free_pages(), and if that fails uses
> vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths
> show result in the same faulty behaviour.

So, what you're wanting is cache coherency between vmalloc() and
userspace.  There is no API in the kernel to do that, and you'll see
the same failures of this interface not only on ARM but also other
architectures with virtual caches.

It sounds like we need an API to flush the cache using both the
userspace address, plus the kernel side address be that in the direct
map or the vmalloc map areas.

Or maybe the right solution is to simply disable AF_PACKET MMAP support
for virtual cached architectures - it may be that adding cache flushing
calls makes the thing too expensive and the benefits of mmap over normal
read/write are lost.
Phil Sutter Sept. 6, 2011, 11:05 a.m. UTC | #8
On Tue, Sep 06, 2011 at 10:57:22AM +0100, Russell King - ARM Linux wrote:
> > The code in question uses __get_free_pages(), and if that fails uses
> > vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths
> > show result in the same faulty behaviour.
> 
> So, what you're wanting is cache coherency between vmalloc() and
> userspace.  There is no API in the kernel to do that, and you'll see
> the same failures of this interface not only on ARM but also other
> architectures with virtual caches.
> 
> It sounds like we need an API to flush the cache using both the
> userspace address, plus the kernel side address be that in the direct
> map or the vmalloc map areas.
> 
> Or maybe the right solution is to simply disable AF_PACKET MMAP support
> for virtual cached architectures - it may be that adding cache flushing
> calls makes the thing too expensive and the benefits of mmap over normal
> read/write are lost.

OK, that's horrible. Of course we depend on just this combination to
work flawlessly, i.e. PACKET_MMAP && VIVT. :(

Another userspace-interface I'm working on uses a different solution:
memory is allocated in userspace and accessed from kernelspace using
get_user_pages(). I did not explicitly search for the earlier described
fault pattern, but we didn't notice any problem with this approach on
the very same hardware either. I already see myself writing TPACKET_V3.
;)

What do you think?

Greetings, Phil
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 243946d..d7b5c2e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -87,6 +87,14 @@ 
 #include <net/inet_common.h>
 #endif
 
+/* whether we need additional cacheflushing between user- and kernel-space */
+#ifdef CONFIG_ARCH_KIRKWOOD
+#  define ENABLE_CACHEPROB_WORKAROUND
+#  define kw_extra_cache_flush()	flush_cache_all()
+#else
+#  define kw_extra_cache_flush()	/* nothing */
+#endif
+
 /*
    Assumptions:
    - if device has no dev->hard_header routine, it adds and removes ll header
@@ -1239,10 +1247,13 @@  static int packet_sendmsg(struct kiocb *iocb, struct socket *sock,
 {
 	struct sock *sk = sock->sk;
 	struct packet_sock *po = pkt_sk(sk);
-	if (po->tx_ring.pg_vec)
-		return tpacket_snd(po, msg);
-	else
-		return packet_snd(sock, msg, len);
+	int rc;
+
+	kw_extra_cache_flush();
+	rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
+			packet_snd(sock, msg, len);
+	kw_extra_cache_flush();
+	return rc;
 }
 
 /*
@@ -2622,6 +2633,11 @@  static int __init packet_init(void)
 	sock_register(&packet_family_ops);
 	register_pernet_subsys(&packet_net_ops);
 	register_netdevice_notifier(&packet_netdev_notifier);
+
+#ifdef ENABLE_CACHEPROB_WORKAROUND
+	printk(KERN_INFO "af_packet: cache coherency workaround for kirkwood is active!\n");
+#endif
+
 out:
 	return rc;
 }