diff mbox

FW: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg

Message ID CAAsGZS6Qiyc6nhgyVLrphSLW6vf16=hbGVWJ+CFw6rfZQsdiFQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

chetan loke Sept. 2, 2011, 4:49 p.m. UTC
On Fri, Sep 2, 2011 at 11:31 AM, Phil Sutter <phil.sutter@viprinet.com> wrote:

> So far we haven't noticed problems in that direction. I just tried some
> explicit test: having tcpdump print local timestamps (not the pcap-ones)
> on every received packet, activating icmp_echo_ignore_all and pinging
> the host on a dedicated line. I expected to sometimes see a second
> difference between the two timestamps, as like with sending from time to
> time a packet should get "lost" in the cache, and then occur to
> userspace after the next one arrived. Maybe my test is broken, or RX is
> indeed unaffected.
>

You will need high traffic rate. If interested, you could try
pktgen(with varying packet-load). Keep the packet-payload under 1500
bytes (don't send jumbo frames) unless you have the following fix:
commit cc9f01b246ca8e4fa245991840b8076394f86707

Your Tx path is working because flush_cache_call gets triggered before
flush_dcache_page. On the Rx path, since you don't have that
workaround, you will eventually(it's just a matter of time) see this
problem.

Or, delete your patch and try this workaround (in
__packet_get/set_status) and you may be able to cover both Tx and Rx
paths.


        default:
@@ -437,13 +445,19 @@ static int __packet_get_status(struct
packet_sock *po, void *frame)

        smp_rmb();

+       kw_extra_cache_flush();
+
        h.raw = frame;
        switch (po->tp_version) {
        case TPACKET_V1:
-               flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #endif
                return h.h1->tp_status;
        case TPACKET_V2:
-               flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #endif
                return h.h2->tp_status;
        case TPACKET_V3:
        default:


> Greetings and thanks for the hints, Phil

Chetan Loke

Comments

Phil Sutter Sept. 6, 2011, 9:44 a.m. UTC | #1
On Fri, Sep 02, 2011 at 12:49:47PM -0400, chetan loke wrote:
> On Fri, Sep 2, 2011 at 11:31 AM, Phil Sutter <phil.sutter@viprinet.com> wrote:
> 
> > So far we haven't noticed problems in that direction. I just tried some
> > explicit test: having tcpdump print local timestamps (not the pcap-ones)
> > on every received packet, activating icmp_echo_ignore_all and pinging
> > the host on a dedicated line. I expected to sometimes see a second
> > difference between the two timestamps, as like with sending from time to
> > time a packet should get "lost" in the cache, and then occur to
> > userspace after the next one arrived. Maybe my test is broken, or RX is
> > indeed unaffected.
> >
> 
> You will need high traffic rate. If interested, you could try
> pktgen(with varying packet-load). Keep the packet-payload under 1500
> bytes (don't send jumbo frames) unless you have the following fix:
> commit cc9f01b246ca8e4fa245991840b8076394f86707

Hmm. I don't really get your point here: with higher traffic rates, the
bug should be even harder to identify. Assuming the same behaviour as
for TX, of course. There are no packets lost, just not immediately
transmitted (or never, if it's the last packet to be sent). This is how
it goes:
1) userspace places packet into TX_RING, calls sendto()
2) kernel does not see the packet, reads TP_STATUS_AVAILABLE for the
   given field from the cache
3) userspace places second packet into TX_RING (after the first one,
   since it knows it's there)
4) something happens that makes caches flush
5) userspace calls sendto()
5) kernel sees two packets to be transmitted, sends them out

So analogous for RX, this should mean:
1) tcpdump runs pcap_loop() (which, according to strace, calls poll()
   with a timeout of 1s)
2) kernel receives packet, puts it into RX_RING, sets POLLIN
3) tcpdump's poll() returns, but an unmodified RX_RING is seen
4) something happens that makes caches flush
5) (2) happens again
6) tcpdump's poll() returns, two packets are seen in RX_RING

My tests on TX-side show that this "something that makes caches flush"
actually happens quite frequently. But nevertheless, when receiving a
packet once a second, I'm expecting to occasionally see no packet in two
seconds, and then two in the following. The higher the packet rate, the
harder it should be to notice this phenomenon.

> Your Tx path is working because flush_cache_call gets triggered before
> flush_dcache_page. On the Rx path, since you don't have that
> workaround, you will eventually(it's just a matter of time) see this
> problem.

So you say if I called flush_cache_all() _after_ flush_dcache_page() it
wouldn't work?

> Or, delete your patch and try this workaround (in
> __packet_get/set_status) and you may be able to cover both Tx and Rx
> paths.

Oh great, thanks a lot for improving my ugly hacks! :)

Greetings, Phil
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2ea3d63..35d71dc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -412,11 +412,19 @@  static void __packet_set_status(struct
packet_sock *po, void *frame, int status)
        switch (po->tp_version) {
        case TPACKET_V1:
                h.h1->tp_status = status;
-               flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #else
+                       kw_extra_cache_flush();
+               endif
                break;
        case TPACKET_V2:
                h.h2->tp_status = status;
-               flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #else
+                       kw_extra_cache_flush();
+               #endif
                break;
        case TPACKET_V3: