Message ID | 20190321190327.11813-1-urezki@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | improve vmap allocation | expand |
On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > Hello. > > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead of > referring you to that link, i will go through it again describing the improved > allocation method and provide changes between v1 and v2 in the end. > > ... > > Performance analysis > -------------------- Impressive numbers. But this is presumably a worst-case microbenchmark. Are you able to describe the benefits which are observed in some real-world workload which someone cares about? It's a lot of new code. I t looks decent and I'll toss it in there for further testing. Hopefully someone will be able to find the time for a detailed review. Trivial point: the code uses "inline" a lot. Nowadays gcc cheerfully ignores that and does its own thing. You might want to look at the effects of simply deleting all that. Is the generated code better or worse or the same? If something really needs to be inlined then use __always_inline, preferably with a comment explaining why it is there.
On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote: > On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > > > Hello. > > > > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead of > > referring you to that link, i will go through it again describing the improved > > allocation method and provide changes between v1 and v2 in the end. > > > > ... > > > > > Performance analysis > > -------------------- > > Impressive numbers. But this is presumably a worst-case microbenchmark. > > Are you able to describe the benefits which are observed in some > real-world workload which someone cares about? > We work with Android. Google uses its own tool called UiBench to measure performance of UI. It counts dropped or delayed frames, or as they call it, jank. Basically if we deliver 59(should be 60) frames per second then we get 1 junk/drop. I see that on our devices avg-jank is lower. In our case Android graphics pipeline uses vmalloc allocations which can lead to delays of UI content to GPU. But such behavior depends on your platform, parts of the system which make use of it and if they are critical to time or not. Second example is indirect impact. During analysis of audio glitches in high-resolution audio the source of drops were long alloc_vmap_area() allocations. # Explanation is here ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt # Audio 10 seconds sample is here. # The drop occurs at 00:09.295 you can hear it ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav > > It's a lot of new code. I t looks decent and I'll toss it in there for > further testing. Hopefully someone will be able to find the time for a > detailed review. > Thank you :) > Trivial point: the code uses "inline" a lot. Nowadays gcc cheerfully > ignores that and does its own thing. You might want to look at the > effects of simply deleting all that. Is the generated code better or > worse or the same? If something really needs to be inlined then use > __always_inline, preferably with a comment explaining why it is there. > When the main core functionalities are "inlined" i see the benefit. At least, it is noticeable by the "test driver". But i agree that i should check one more time to see what can be excluded and used as a regular call. Thanks for the hint, it is worth to go with __always_inline instead. -- Vlad Rezki
On Fri, Mar 22, 2019 at 05:52:59PM +0100, Uladzislau Rezki wrote: > On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote: > > On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > > > > > Hello. > > > > > > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead of > > > referring you to that link, i will go through it again describing the improved > > > allocation method and provide changes between v1 and v2 in the end. > > > > > > ... > > > > > > > > Performance analysis > > > -------------------- > > > > Impressive numbers. But this is presumably a worst-case microbenchmark. > > > > Are you able to describe the benefits which are observed in some > > real-world workload which someone cares about? > > > We work with Android. Google uses its own tool called UiBench to measure > performance of UI. It counts dropped or delayed frames, or as they call it, > jank. Basically if we deliver 59(should be 60) frames per second then we > get 1 junk/drop. Agreed. Strictly speaking, "1 Jank" is not necessarily "1 frame drop". A delayed frame is also a Jank. Just because a frame is delayed does not mean it is dropped, there is double buffering etc to absorb delays. > I see that on our devices avg-jank is lower. In our case Android graphics > pipeline uses vmalloc allocations which can lead to delays of UI content > to GPU. But such behavior depends on your platform, parts of the system > which make use of it and if they are critical to time or not. > > Second example is indirect impact. During analysis of audio glitches > in high-resolution audio the source of drops were long alloc_vmap_area() > allocations. > > # Explanation is here > ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt > > # Audio 10 seconds sample is here. > # The drop occurs at 00:09.295 you can hear it > ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav Nice. > > It's a lot of new code. I t looks decent and I'll toss it in there for > > further testing. Hopefully someone will be able to find the time for a > > detailed review. > > > Thank you :) I can try to do a review fwiw. But I am severely buried right now. I did look at vmalloc code before for similar reasons (preempt off related delays causing jank / glitches etc). Any case, I'll take another look soon (in next 1-2 weeks). > > Trivial point: the code uses "inline" a lot. Nowadays gcc cheerfully > > ignores that and does its own thing. You might want to look at the > > effects of simply deleting all that. Is the generated code better or > > worse or the same? If something really needs to be inlined then use > > __always_inline, preferably with a comment explaining why it is there. > > > When the main core functionalities are "inlined" i see the benefit. > At least, it is noticeable by the "test driver". But i agree that > i should check one more time to see what can be excluded and used > as a regular call. Thanks for the hint, it is worth to go with > __always_inline instead. I wonder how clang behaves as far as inline hints go. That is how Android images build their kernels. thanks, - Joel
Hello, Andrew. > > It's a lot of new code. I t looks decent and I'll toss it in there for > further testing. Hopefully someone will be able to find the time for a > detailed review. > I have got some proposals and comments about simplifying the code a bit. So i am about to upload the v3 for further review. I see that your commit message includes whole details and explanation: http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=b6ac5ca4c512094f217a8140edeea17e6621b2ad Should i base on and keep it in v3? Thank you in advance! -- Vlad Rezki
On Mon, 1 Apr 2019 13:03:47 +0200 Uladzislau Rezki <urezki@gmail.com> wrote: > Hello, Andrew. > > > > > It's a lot of new code. I t looks decent and I'll toss it in there for > > further testing. Hopefully someone will be able to find the time for a > > detailed review. > > > I have got some proposals and comments about simplifying the code a bit. > So i am about to upload the v3 for further review. I see that your commit > message includes whole details and explanation: > > http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=b6ac5ca4c512094f217a8140edeea17e6621b2ad > > Should i base on and keep it in v3? > It's probably best to do a clean resend at this stage. I'll take a look at the deltas and updating changelogs, etc.
On Mon, Apr 01, 2019 at 03:59:39PM -0700, Andrew Morton wrote: > On Mon, 1 Apr 2019 13:03:47 +0200 Uladzislau Rezki <urezki@gmail.com> wrote: > > > Hello, Andrew. > > > > > > > > It's a lot of new code. I t looks decent and I'll toss it in there for > > > further testing. Hopefully someone will be able to find the time for a > > > detailed review. > > > > > I have got some proposals and comments about simplifying the code a bit. > > So i am about to upload the v3 for further review. I see that your commit > > message includes whole details and explanation: > > > > http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=b6ac5ca4c512094f217a8140edeea17e6621b2ad > > > > Should i base on and keep it in v3? > > > > It's probably best to do a clean resend at this stage. I'll take a > look at the deltas and updating changelogs, etc. Will resend then. Thank you. -- Vlad Rezki