mbox series

[v2,0/2] MIPS: convert to generic entry

Message ID cover.1631583258.git.chenfeiyang@loongson.cn (mailing list archive)
Headers show
Series MIPS: convert to generic entry | expand

Message

Feiyang Chen Sept. 14, 2021, 1:50 a.m. UTC
Convert MIPS to use the generic entry infrastructure from
kernel/entry/*.

v2: Use regs->regs[27] to mark whether to restore all registers in
handle_sys and enable IRQ stack.

Feiyang Chen (2):
  MIPS: convert syscall to generic entry
  MIPS: convert irq to generic entry

 arch/mips/Kconfig                         |   1 +
 arch/mips/include/asm/entry-common.h      |  13 ++
 arch/mips/include/asm/irqflags.h          |  42 ----
 arch/mips/include/asm/ptrace.h            |   8 +-
 arch/mips/include/asm/sim.h               |  70 -------
 arch/mips/include/asm/stackframe.h        |   8 +
 arch/mips/include/asm/syscall.h           |   5 +
 arch/mips/include/asm/thread_info.h       |  17 +-
 arch/mips/include/uapi/asm/ptrace.h       |   7 +-
 arch/mips/kernel/Makefile                 |  14 +-
 arch/mips/kernel/entry.S                  | 143 +-------------
 arch/mips/kernel/genex.S                  | 150 +++------------
 arch/mips/kernel/head.S                   |   1 -
 arch/mips/kernel/linux32.c                |   1 -
 arch/mips/kernel/ptrace.c                 |  78 --------
 arch/mips/kernel/r4k-bugs64.c             |  14 +-
 arch/mips/kernel/scall.S                  | 136 +++++++++++++
 arch/mips/kernel/scall32-o32.S            | 223 ---------------------
 arch/mips/kernel/scall64-n32.S            | 107 ----------
 arch/mips/kernel/scall64-n64.S            | 116 -----------
 arch/mips/kernel/scall64-o32.S            | 221 ---------------------
 arch/mips/kernel/signal.c                 |  59 +-----
 arch/mips/kernel/signal_n32.c             |  15 +-
 arch/mips/kernel/signal_o32.c             |  29 +--
 arch/mips/kernel/syscall.c                | 148 +++++++++++---
 arch/mips/kernel/syscalls/syscall_n32.tbl |   8 +-
 arch/mips/kernel/syscalls/syscall_n64.tbl |   8 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |   8 +-
 arch/mips/kernel/traps.c                  | 225 ++++++++++++++++------
 arch/mips/kernel/unaligned.c              |  19 +-
 arch/mips/mm/c-octeon.c                   |  15 ++
 arch/mips/mm/cex-oct.S                    |   8 +-
 arch/mips/mm/fault.c                      |  12 +-
 arch/mips/mm/tlbex-fault.S                |   7 +-
 34 files changed, 594 insertions(+), 1342 deletions(-)
 create mode 100644 arch/mips/include/asm/entry-common.h
 delete mode 100644 arch/mips/include/asm/sim.h
 create mode 100644 arch/mips/kernel/scall.S
 delete mode 100644 arch/mips/kernel/scall32-o32.S
 delete mode 100644 arch/mips/kernel/scall64-n32.S
 delete mode 100644 arch/mips/kernel/scall64-n64.S
 delete mode 100644 arch/mips/kernel/scall64-o32.S

Comments

Jiaxun Yang Sept. 14, 2021, 8:54 a.m. UTC | #1
在 2021/9/14 2:50, Feiyang Chen 写道:
> Convert MIPS to use the generic entry infrastructure from
> kernel/entry/*.
>
> v2: Use regs->regs[27] to mark whether to restore all registers in
> handle_sys and enable IRQ stack.
Hi Feiyang,

Thanks for your patch, could you please expand how could this improve 
the performance?

Thanks.
- Jiaxun
>
> Feiyang Chen (2):
>    MIPS: convert syscall to generic entry
>    MIPS: convert irq to generic entry
>
>   arch/mips/Kconfig                         |   1 +
>   arch/mips/include/asm/entry-common.h      |  13 ++
>   arch/mips/include/asm/irqflags.h          |  42 ----
>   arch/mips/include/asm/ptrace.h            |   8 +-
>   arch/mips/include/asm/sim.h               |  70 -------
>   arch/mips/include/asm/stackframe.h        |   8 +
>   arch/mips/include/asm/syscall.h           |   5 +
>   arch/mips/include/asm/thread_info.h       |  17 +-
>   arch/mips/include/uapi/asm/ptrace.h       |   7 +-
>   arch/mips/kernel/Makefile                 |  14 +-
>   arch/mips/kernel/entry.S                  | 143 +-------------
>   arch/mips/kernel/genex.S                  | 150 +++------------
>   arch/mips/kernel/head.S                   |   1 -
>   arch/mips/kernel/linux32.c                |   1 -
>   arch/mips/kernel/ptrace.c                 |  78 --------
>   arch/mips/kernel/r4k-bugs64.c             |  14 +-
>   arch/mips/kernel/scall.S                  | 136 +++++++++++++
>   arch/mips/kernel/scall32-o32.S            | 223 ---------------------
>   arch/mips/kernel/scall64-n32.S            | 107 ----------
>   arch/mips/kernel/scall64-n64.S            | 116 -----------
>   arch/mips/kernel/scall64-o32.S            | 221 ---------------------
>   arch/mips/kernel/signal.c                 |  59 +-----
>   arch/mips/kernel/signal_n32.c             |  15 +-
>   arch/mips/kernel/signal_o32.c             |  29 +--
>   arch/mips/kernel/syscall.c                | 148 +++++++++++---
>   arch/mips/kernel/syscalls/syscall_n32.tbl |   8 +-
>   arch/mips/kernel/syscalls/syscall_n64.tbl |   8 +-
>   arch/mips/kernel/syscalls/syscall_o32.tbl |   8 +-
>   arch/mips/kernel/traps.c                  | 225 ++++++++++++++++------
>   arch/mips/kernel/unaligned.c              |  19 +-
>   arch/mips/mm/c-octeon.c                   |  15 ++
>   arch/mips/mm/cex-oct.S                    |   8 +-
>   arch/mips/mm/fault.c                      |  12 +-
>   arch/mips/mm/tlbex-fault.S                |   7 +-
>   34 files changed, 594 insertions(+), 1342 deletions(-)
>   create mode 100644 arch/mips/include/asm/entry-common.h
>   delete mode 100644 arch/mips/include/asm/sim.h
>   create mode 100644 arch/mips/kernel/scall.S
>   delete mode 100644 arch/mips/kernel/scall32-o32.S
>   delete mode 100644 arch/mips/kernel/scall64-n32.S
>   delete mode 100644 arch/mips/kernel/scall64-n64.S
>   delete mode 100644 arch/mips/kernel/scall64-o32.S
>
Feiyang Chen Sept. 14, 2021, 9:30 a.m. UTC | #2
On Tue, 14 Sept 2021 at 16:54, Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
>
>
>
> 在 2021/9/14 2:50, Feiyang Chen 写道:
> > Convert MIPS to use the generic entry infrastructure from
> > kernel/entry/*.
> >
> > v2: Use regs->regs[27] to mark whether to restore all registers in
> > handle_sys and enable IRQ stack.
> Hi Feiyang,
>
> Thanks for your patch, could you please expand how could this improve
> the performance?
>

Hi, Jiaxun,

We always restore all registers in handle_sys in the v1 of the
patchset. Since regs->regs[27] is marked where we need to restore all
registers, now we simply use it as the return value of do_syscall to
determine whether we can only restore partial registers in handle_sys.

+       move    a0, sp
+       jal     do_syscall
+       beqz    v0, 1f                          # restore all registers?
+       nop
+
+       .set    noat
+       RESTORE_TEMP
+       RESTORE_STATIC
+       RESTORE_AT
+1:     RESTORE_SOME
+       RESTORE_SP_AND_RET
+       .set    at

Thanks,
Feiyang

> Thanks.
> - Jiaxun
> >
> > Feiyang Chen (2):
> >    MIPS: convert syscall to generic entry
> >    MIPS: convert irq to generic entry
> >
> >   arch/mips/Kconfig                         |   1 +
> >   arch/mips/include/asm/entry-common.h      |  13 ++
> >   arch/mips/include/asm/irqflags.h          |  42 ----
> >   arch/mips/include/asm/ptrace.h            |   8 +-
> >   arch/mips/include/asm/sim.h               |  70 -------
> >   arch/mips/include/asm/stackframe.h        |   8 +
> >   arch/mips/include/asm/syscall.h           |   5 +
> >   arch/mips/include/asm/thread_info.h       |  17 +-
> >   arch/mips/include/uapi/asm/ptrace.h       |   7 +-
> >   arch/mips/kernel/Makefile                 |  14 +-
> >   arch/mips/kernel/entry.S                  | 143 +-------------
> >   arch/mips/kernel/genex.S                  | 150 +++------------
> >   arch/mips/kernel/head.S                   |   1 -
> >   arch/mips/kernel/linux32.c                |   1 -
> >   arch/mips/kernel/ptrace.c                 |  78 --------
> >   arch/mips/kernel/r4k-bugs64.c             |  14 +-
> >   arch/mips/kernel/scall.S                  | 136 +++++++++++++
> >   arch/mips/kernel/scall32-o32.S            | 223 ---------------------
> >   arch/mips/kernel/scall64-n32.S            | 107 ----------
> >   arch/mips/kernel/scall64-n64.S            | 116 -----------
> >   arch/mips/kernel/scall64-o32.S            | 221 ---------------------
> >   arch/mips/kernel/signal.c                 |  59 +-----
> >   arch/mips/kernel/signal_n32.c             |  15 +-
> >   arch/mips/kernel/signal_o32.c             |  29 +--
> >   arch/mips/kernel/syscall.c                | 148 +++++++++++---
> >   arch/mips/kernel/syscalls/syscall_n32.tbl |   8 +-
> >   arch/mips/kernel/syscalls/syscall_n64.tbl |   8 +-
> >   arch/mips/kernel/syscalls/syscall_o32.tbl |   8 +-
> >   arch/mips/kernel/traps.c                  | 225 ++++++++++++++++------
> >   arch/mips/kernel/unaligned.c              |  19 +-
> >   arch/mips/mm/c-octeon.c                   |  15 ++
> >   arch/mips/mm/cex-oct.S                    |   8 +-
> >   arch/mips/mm/fault.c                      |  12 +-
> >   arch/mips/mm/tlbex-fault.S                |   7 +-
> >   34 files changed, 594 insertions(+), 1342 deletions(-)
> >   create mode 100644 arch/mips/include/asm/entry-common.h
> >   delete mode 100644 arch/mips/include/asm/sim.h
> >   create mode 100644 arch/mips/kernel/scall.S
> >   delete mode 100644 arch/mips/kernel/scall32-o32.S
> >   delete mode 100644 arch/mips/kernel/scall64-n32.S
> >   delete mode 100644 arch/mips/kernel/scall64-n64.S
> >   delete mode 100644 arch/mips/kernel/scall64-o32.S
> >
>
Thomas Bogendoerfer Sept. 21, 2021, 3:57 p.m. UTC | #3
On Tue, Sep 14, 2021 at 05:30:14PM +0800, Feiyang Chen wrote:
> On Tue, 14 Sept 2021 at 16:54, Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
> >
> >
> >
> > 在 2021/9/14 2:50, Feiyang Chen 写道:
> > > Convert MIPS to use the generic entry infrastructure from
> > > kernel/entry/*.
> > >
> > > v2: Use regs->regs[27] to mark whether to restore all registers in
> > > handle_sys and enable IRQ stack.
> > Hi Feiyang,
> >
> > Thanks for your patch, could you please expand how could this improve
> > the performance?
> >
> 
> Hi, Jiaxun,
> 
> We always restore all registers in handle_sys in the v1 of the
> patchset. Since regs->regs[27] is marked where we need to restore all
> registers, now we simply use it as the return value of do_syscall to
> determine whether we can only restore partial registers in handle_sys.

can people, who provided performance numbers for v1 do the same for v2 ?

Thomas,
Zhou Yanjie Sept. 23, 2021, 2:33 p.m. UTC | #4
Hi Thomas,

On 2021/9/21 下午11:57, Thomas Bogendoerfer wrote:
> On Tue, Sep 14, 2021 at 05:30:14PM +0800, Feiyang Chen wrote:
>> On Tue, 14 Sept 2021 at 16:54, Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
>>>
>>>
>>> 在 2021/9/14 2:50, Feiyang Chen 写道:
>>>> Convert MIPS to use the generic entry infrastructure from
>>>> kernel/entry/*.
>>>>
>>>> v2: Use regs->regs[27] to mark whether to restore all registers in
>>>> handle_sys and enable IRQ stack.
>>> Hi Feiyang,
>>>
>>> Thanks for your patch, could you please expand how could this improve
>>> the performance?
>>>
>> Hi, Jiaxun,
>>
>> We always restore all registers in handle_sys in the v1 of the
>> patchset. Since regs->regs[27] is marked where we need to restore all
>> registers, now we simply use it as the return value of do_syscall to
>> determine whether we can only restore partial registers in handle_sys.
> can people, who provided performance numbers for v1 do the same for v2 ?


Sure, I will test the v2 in the next few days.


Thanks and best regards!


>
> Thomas,
>
Jiaxun Yang Sept. 23, 2021, 11:51 p.m. UTC | #5
在2021年9月21日九月 下午4:57,Thomas Bogendoerfer写道:
>
> can people, who provided performance numbers for v1 do the same for v2 ?

Sorry I just move abroad (to UK) to seek higher education. Currently I don't have any MIPS device available so I won't be able to provide test results as v1.

I'll try to get some MIPS devices soonish.

Thanks.

>
> Thomas,
>
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.                                                [ RFC1925, 2.3 ]
Zhou Yanjie Oct. 13, 2021, 3 p.m. UTC | #6
Hi Thomas,

On 2021/9/23 下午10:33, Zhou Yanjie wrote:
> Hi Thomas,
>
> On 2021/9/21 下午11:57, Thomas Bogendoerfer wrote:
>> On Tue, Sep 14, 2021 at 05:30:14PM +0800, Feiyang Chen wrote:
>>> On Tue, 14 Sept 2021 at 16:54, Jiaxun Yang <jiaxun.yang@flygoat.com> 
>>> wrote:
>>>>
>>>>
>>>> 在 2021/9/14 2:50, Feiyang Chen 写道:
>>>>> Convert MIPS to use the generic entry infrastructure from
>>>>> kernel/entry/*.
>>>>>
>>>>> v2: Use regs->regs[27] to mark whether to restore all registers in
>>>>> handle_sys and enable IRQ stack.
>>>> Hi Feiyang,
>>>>
>>>> Thanks for your patch, could you please expand how could this improve
>>>> the performance?
>>>>
>>> Hi, Jiaxun,
>>>
>>> We always restore all registers in handle_sys in the v1 of the
>>> patchset. Since regs->regs[27] is marked where we need to restore all
>>> registers, now we simply use it as the return value of do_syscall to
>>> determine whether we can only restore partial registers in handle_sys.
>> can people, who provided performance numbers for v1 do the same for v2 ?
>
>
> Sure, I will test the v2 in the next few days.


Sorry for the delay, It took a lot of time to migrate the environment to 
my new computer, here is the results:


Score Without Patches  Score With Patches  Performance Change SoC Model
        105.9                102.1              -3.6%  JZ4775
        132.4                124.1              -6.3%  JZ4780(SMP off)
        170.2                155.7             -8.5%  JZ4780(SMP on)
        101.3                 91.5              -9.7%  X1000E
        187.1                179.4              -4.1%  X1830
        324.9                314.3              -3.3%  X2000(SMT off)
        394.6                373.9              -5.2%  X2000(SMT off)


Compared with the V1 version, there are some improvements, but the 
performance loss is still a bit obvious

Thanks and best regards!


>
>
> Thanks and best regards!
>
>
>>
>> Thomas,
>>
Maciej W. Rozycki Oct. 18, 2021, 7:32 p.m. UTC | #7
On Wed, 13 Oct 2021, Zhou Yanjie wrote:

> > > can people, who provided performance numbers for v1 do the same for v2 ?
> > 
> > 
> > Sure, I will test the v2 in the next few days.
> 
> 
> Sorry for the delay, It took a lot of time to migrate the environment to my
> new computer, here is the results:
> 
> 
> Score Without Patches  Score With Patches  Performance Change SoC Model
>        105.9                102.1              -3.6%  JZ4775
>        132.4                124.1              -6.3%  JZ4780(SMP off)
>        170.2                155.7             -8.5%  JZ4780(SMP on)
>        101.3                 91.5              -9.7%  X1000E
>        187.1                179.4              -4.1%  X1830
>        324.9                314.3              -3.3%  X2000(SMT off)
>        394.6                373.9              -5.2%  X2000(SMT off)
> 
> 
> Compared with the V1 version, there are some improvements, but the performance
> loss is still a bit obvious

 The MIPS port of Linux has always had the pride of having a particularly 
low syscall overhead and I'd rather we didn't lose this quality.

 FWIW,

  Maciej
Feiyang Chen Oct. 19, 2021, 2:22 a.m. UTC | #8
On Tue, 19 Oct 2021 at 03:32, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Wed, 13 Oct 2021, Zhou Yanjie wrote:
>
> > > > can people, who provided performance numbers for v1 do the same for v2 ?
> > >
> > >
> > > Sure, I will test the v2 in the next few days.
> >
> >
> > Sorry for the delay, It took a lot of time to migrate the environment to my
> > new computer, here is the results:
> >
> >
> > Score Without Patches  Score With Patches  Performance Change SoC Model
> >        105.9                102.1              -3.6%  JZ4775
> >        132.4                124.1              -6.3%  JZ4780(SMP off)
> >        170.2                155.7             -8.5%  JZ4780(SMP on)
> >        101.3                 91.5              -9.7%  X1000E
> >        187.1                179.4              -4.1%  X1830
> >        324.9                314.3              -3.3%  X2000(SMT off)
> >        394.6                373.9              -5.2%  X2000(SMT off)
> >
> >
> > Compared with the V1 version, there are some improvements, but the performance
> > loss is still a bit obvious
>
>  The MIPS port of Linux has always had the pride of having a particularly
> low syscall overhead and I'd rather we didn't lose this quality.

Hi, Maciej,

1. The current trend is to use generic code, so I think this work is
worth it, even if there is some performance loss.
2. We tested the performance on 5.15-rc1~rc5 and the performance
loss on JZ4780 (SMP off) is not so obvious (about -3%).
3. Yanjie, is there any problem with the code base you tested?
Could you help to test patch v3 on the latest mainline kernel?

Thanks,
Feiyang

>
>  FWIW,
>
>   Maciej
Maciej W. Rozycki Oct. 19, 2021, 8:33 a.m. UTC | #9
On Tue, 19 Oct 2021, Feiyang Chen wrote:

> > > Score Without Patches  Score With Patches  Performance Change SoC Model
> > >        105.9                102.1              -3.6%  JZ4775
> > >        132.4                124.1              -6.3%  JZ4780(SMP off)
> > >        170.2                155.7             -8.5%  JZ4780(SMP on)
> > >        101.3                 91.5              -9.7%  X1000E
> > >        187.1                179.4              -4.1%  X1830
> > >        324.9                314.3              -3.3%  X2000(SMT off)
> > >        394.6                373.9              -5.2%  X2000(SMT off)
> > >
> > >
> > > Compared with the V1 version, there are some improvements, but the performance
> > > loss is still a bit obvious
> >
> >  The MIPS port of Linux has always had the pride of having a particularly
> > low syscall overhead and I'd rather we didn't lose this quality.
> 
> Hi, Maciej,
> 
> 1. The current trend is to use generic code, so I think this work is
> worth it, even if there is some performance loss.

 Well, a trend is not a proper justification on its own for existing code, 
and mature one for that matter, that works.  Surely it might be for an 
entirely new port, but the MIPS port is not exactly one.

> 2. We tested the performance on 5.15-rc1~rc5 and the performance
> loss on JZ4780 (SMP off) is not so obvious (about -3%).

 I've seen teams work hard to improve performance by less than 3%, so 
depending on how you look at it the loss is not necessarily small, even if 
not abysmal.  And I find the figure of almost 10% cited for another system 
even more worrisome.  Also you've written the figures are from UnixBench, 
which I suppose measures some kind of an average across various workloads.  
Can you elaborate on the methodology used by that benchmark?

 Can you tell me what the performance loss is for a cheap syscall such as 
`getuid'?  That would indicate how much is actually lost in the invocation 
overhead.

 With that amount known, would you be able to indicate where exactly the 
performance is getting lost in generic code?  Can it be improved?

  Maciej
Feiyang Chen Oct. 22, 2021, 2:19 a.m. UTC | #10
On Tue, 19 Oct 2021 at 16:33, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Tue, 19 Oct 2021, Feiyang Chen wrote:
>
> > > > Score Without Patches  Score With Patches  Performance Change SoC Model
> > > >        105.9                102.1              -3.6%  JZ4775
> > > >        132.4                124.1              -6.3%  JZ4780(SMP off)
> > > >        170.2                155.7             -8.5%  JZ4780(SMP on)
> > > >        101.3                 91.5              -9.7%  X1000E
> > > >        187.1                179.4              -4.1%  X1830
> > > >        324.9                314.3              -3.3%  X2000(SMT off)
> > > >        394.6                373.9              -5.2%  X2000(SMT off)
> > > >
> > > >
> > > > Compared with the V1 version, there are some improvements, but the performance
> > > > loss is still a bit obvious
> > >
> > >  The MIPS port of Linux has always had the pride of having a particularly
> > > low syscall overhead and I'd rather we didn't lose this quality.
> >
> > Hi, Maciej,
> >
> > 1. The current trend is to use generic code, so I think this work is
> > worth it, even if there is some performance loss.
>
>  Well, a trend is not a proper justification on its own for existing code,
> and mature one for that matter, that works.  Surely it might be for an
> entirely new port, but the MIPS port is not exactly one.
>
> > 2. We tested the performance on 5.15-rc1~rc5 and the performance
> > loss on JZ4780 (SMP off) is not so obvious (about -3%).
>
>  I've seen teams work hard to improve performance by less than 3%, so
> depending on how you look at it the loss is not necessarily small, even if
> not abysmal.  And I find the figure of almost 10% cited for another system
> even more worrisome.  Also you've written the figures are from UnixBench,
> which I suppose measures some kind of an average across various workloads.
> Can you elaborate on the methodology used by that benchmark?

Hi, Maciej,

UnixBench uses multiple tests to test various aspects of the system's
performance:

- Dhrystone test measures the speed and efficiency of non-floating-point
  operations.
- Whetstone test measures the speed and efficiency of floating-point
  operations.
- execl Throughput test measures the number of execl() calls that can be
  performed per second.
- File Copy test measures the rate at which data can be transferred from one
  file to another, using various buffer sizes.
- Pipe Throughput test measures the number of times (per second) a process
  can write 512 bytes to a pipe and read them back.
- Pipe-based Context Switching test measures the number of times two
  processes can exchange an increasing integer through a pipe.
- Process Creation test measures the number of times a process can fork and
  reap a child that immediately exits.
- Shell Scripts test measures the number of times per minute a process can
  start and reap a set of one, two, four and eight concurrent copies of a
  shell script where the shell script applies a series of transformations
  to a data file.
- System Call Overhead test measures the cost of entering and leaving the
  operating system kernel.

In these tests above, the most affected is the System Call Overhead test,
and I'll go into more detail about how it's measured.

The System Call Overhead test counts the sets of system calls that are
completed within the specified time (usually 10 seconds). By default, a set
of system calls contain close(), getpid(), getuid(), and umask(). We call
the test score "index". Specifically, the score for this test is calculated
as follows:

product = log(count) - log(time / timebase)
result = exp(product / iterations)
index = result / baseline * 10

"timebase" and "baseline" are fixed values that are different for each test.
Scores for other tests are calculated in a similar way. The final total
score is calculated as follows (The total number of tests is "N"):

index = exp((log(result1) + log(result2) + ... + log(resultN)) / N) * 10

>
>  Can you tell me what the performance loss is for a cheap syscall such as
> `getuid'?  That would indicate how much is actually lost in the invocation
> overhead.

We use perf to measure the sys time of the the following program on Loongson
3A4000:

int main(void)
{
    for (int i = 0; i < 10000000; i++)
        getuid();
    return 0;
}

The program will take about 1.2 seconds of sys time before the kernel is
patched, and about 1.3 seconds after the kernel is patched.

>
>  With that amount known, would you be able to indicate where exactly the
> performance is getting lost in generic code?  Can it be improved?

Sorry, we tried to use perf to analyze the extra time, but have no idea at
the moment, since most of the code is located in __noinstr_text_start.

Thanks,
Feiyang

>
>   Maciej