使用Cython和Nuitka编译Python脚本

0x00 前言

在Python中,可以使用py2exePyInstaller之类的工具将Python脚本编译成二进制文件,从而提升可移植性,并在一定程度上提升了性能。不过这类工具的实现只是将py文件编译成pyc或pyo,在安全性上还是弱了一些,存在被反编译的风险。

为了测试不同编译方式的性能差异,这里统一使用python2.7中提供的test/pystone.py作为执行脚本。由于这个脚本不支持python3,因此做了下python3的适配。完整的测试代码如下:

  1. #! /usr/bin/env python
  2. """
  3. "PYSTONE" Benchmark Program
  4. Version: Python/1.1 (corresponds to C/1.1 plus 2 Pystone fixes)
  5. Author: Reinhold P. Weicker, CACM Vol 27, No 10, 10/84 pg. 1013.
  6. Translated from ADA to C by Rick Richardson.
  7. Every method to preserve ADA-likeness has been used,
  8. at the expense of C-ness.
  9. Translated from C to Python by Guido van Rossum.
  10. Version History:
  11. Version 1.1 corrects two bugs in version 1.0:
  12. First, it leaked memory: in Proc1(), NextRecord ends
  13. up having a pointer to itself. I have corrected this
  14. by zapping NextRecord.PtrComp at the end of Proc1().
  15. Second, Proc3() used the operator != to compare a
  16. record to None. This is rather inefficient and not
  17. true to the intention of the original benchmark (where
  18. a pointer comparison to None is intended; the !=
  19. operator attempts to find a method __cmp__ to do value
  20. comparison of the record). Version 1.1 runs 5-10
  21. percent faster than version 1.0, so benchmark figures
  22. of different versions can't be compared directly.
  23. """
  24. LOOPS = 50000
  25. try:
  26. from time import perf_counter as clock
  27. except ImportError:
  28. from time import clock
  29. __version__ = "1.1"
  30. [Ident1, Ident2, Ident3, Ident4, Ident5] = range(1, 6)
  31. class Record:
  32. def __init__(self, PtrComp = None, Discr = 0, EnumComp = 0,
  33. IntComp = 0, StringComp = 0):
  34. self.PtrComp = PtrComp
  35. self.Discr = Discr
  36. self.EnumComp = EnumComp
  37. self.IntComp = IntComp
  38. self.StringComp = StringComp
  39. def copy(self):
  40. return Record(self.PtrComp, self.Discr, self.EnumComp,
  41. self.IntComp, self.StringComp)
  42. TRUE = 1
  43. FALSE = 0
  44. def main(loops=LOOPS):
  45. benchtime, stones = pystones(loops)
  46. print("Pystone(%s) time for %d passes = %g" % \
  47. (__version__, loops, benchtime))
  48. print("This machine benchmarks at %g pystones/second" % stones)
  49. def pystones(loops=LOOPS):
  50. return Proc0(loops)
  51. IntGlob = 0
  52. BoolGlob = FALSE
  53. Char1Glob = '\0'
  54. Char2Glob = '\0'
  55. Array1Glob = [0]*51
  56. Array2Glob = list(map(lambda x: x[:], [Array1Glob]*51))
  57. PtrGlb = None
  58. PtrGlbNext = None
  59. def Proc0(loops=LOOPS):
  60. global IntGlob
  61. global BoolGlob
  62. global Char1Glob
  63. global Char2Glob
  64. global Array1Glob
  65. global Array2Glob
  66. global PtrGlb
  67. global PtrGlbNext
  68. starttime = clock()
  69. for i in range(loops):
  70. pass
  71. nulltime = clock() - starttime
  72. PtrGlbNext = Record()
  73. PtrGlb = Record()
  74. PtrGlb.PtrComp = PtrGlbNext
  75. PtrGlb.Discr = Ident1
  76. PtrGlb.EnumComp = Ident3
  77. PtrGlb.IntComp = 40
  78. PtrGlb.StringComp = "DHRYSTONE PROGRAM, SOME STRING"
  79. String1Loc = "DHRYSTONE PROGRAM, 1'ST STRING"
  80. Array2Glob[8][7] = 10
  81. starttime = clock()
  82. for i in range(loops):
  83. Proc5()
  84. Proc4()
  85. IntLoc1 = 2
  86. IntLoc2 = 3
  87. String2Loc = "DHRYSTONE PROGRAM, 2'ND STRING"
  88. EnumLoc = Ident2
  89. BoolGlob = not Func2(String1Loc, String2Loc)
  90. while IntLoc1 < IntLoc2:
  91. IntLoc3 = 5 * IntLoc1 - IntLoc2
  92. IntLoc3 = Proc7(IntLoc1, IntLoc2)
  93. IntLoc1 = IntLoc1 + 1
  94. Proc8(Array1Glob, Array2Glob, IntLoc1, IntLoc3)
  95. PtrGlb = Proc1(PtrGlb)
  96. CharIndex = 'A'
  97. while CharIndex <= Char2Glob:
  98. if EnumLoc == Func1(CharIndex, 'C'):
  99. EnumLoc = Proc6(Ident1)
  100. CharIndex = chr(ord(CharIndex)+1)
  101. IntLoc3 = IntLoc2 * IntLoc1
  102. IntLoc2 = IntLoc3 / IntLoc1
  103. IntLoc2 = 7 * (IntLoc3 - IntLoc2) - IntLoc1
  104. IntLoc1 = Proc2(IntLoc1)
  105. benchtime = clock() - starttime - nulltime
  106. if benchtime == 0.0:
  107. loopsPerBenchtime = 0.0
  108. else:
  109. loopsPerBenchtime = (loops / benchtime)
  110. return benchtime, loopsPerBenchtime
  111. def Proc1(PtrParIn):
  112. PtrParIn.PtrComp = NextRecord = PtrGlb.copy()
  113. PtrParIn.IntComp = 5
  114. NextRecord.IntComp = PtrParIn.IntComp
  115. NextRecord.PtrComp = PtrParIn.PtrComp
  116. NextRecord.PtrComp = Proc3(NextRecord.PtrComp)
  117. if NextRecord.Discr == Ident1:
  118. NextRecord.IntComp = 6
  119. NextRecord.EnumComp = Proc6(PtrParIn.EnumComp)
  120. NextRecord.PtrComp = PtrGlb.PtrComp
  121. NextRecord.IntComp = Proc7(NextRecord.IntComp, 10)
  122. else:
  123. PtrParIn = NextRecord.copy()
  124. NextRecord.PtrComp = None
  125. return PtrParIn
  126. def Proc2(IntParIO):
  127. IntLoc = IntParIO + 10
  128. while 1:
  129. if Char1Glob == 'A':
  130. IntLoc = IntLoc - 1
  131. IntParIO = IntLoc - IntGlob
  132. EnumLoc = Ident1
  133. if EnumLoc == Ident1:
  134. break
  135. return IntParIO
  136. def Proc3(PtrParOut):
  137. global IntGlob
  138. if PtrGlb is not None:
  139. PtrParOut = PtrGlb.PtrComp
  140. else:
  141. IntGlob = 100
  142. PtrGlb.IntComp = Proc7(10, IntGlob)
  143. return PtrParOut
  144. def Proc4():
  145. global Char2Glob
  146. BoolLoc = Char1Glob == 'A'
  147. BoolLoc = BoolLoc or BoolGlob
  148. Char2Glob = 'B'
  149. def Proc5():
  150. global Char1Glob
  151. global BoolGlob
  152. Char1Glob = 'A'
  153. BoolGlob = FALSE
  154. def Proc6(EnumParIn):
  155. EnumParOut = EnumParIn
  156. if not Func3(EnumParIn):
  157. EnumParOut = Ident4
  158. if EnumParIn == Ident1:
  159. EnumParOut = Ident1
  160. elif EnumParIn == Ident2:
  161. if IntGlob > 100:
  162. EnumParOut = Ident1
  163. else:
  164. EnumParOut = Ident4
  165. elif EnumParIn == Ident3:
  166. EnumParOut = Ident2
  167. elif EnumParIn == Ident4:
  168. pass
  169. elif EnumParIn == Ident5:
  170. EnumParOut = Ident3
  171. return EnumParOut
  172. def Proc7(IntParI1, IntParI2):
  173. IntLoc = IntParI1 + 2
  174. IntParOut = IntParI2 + IntLoc
  175. return IntParOut
  176. def Proc8(Array1Par, Array2Par, IntParI1, IntParI2):
  177. global IntGlob
  178. IntLoc = IntParI1 + 5
  179. Array1Par[IntLoc] = IntParI2
  180. Array1Par[IntLoc+1] = Array1Par[IntLoc]
  181. Array1Par[IntLoc+30] = IntLoc
  182. for IntIndex in range(IntLoc, IntLoc+2):
  183. Array2Par[IntLoc][IntIndex] = IntLoc
  184. Array2Par[IntLoc][IntLoc-1] = Array2Par[IntLoc][IntLoc-1] + 1
  185. Array2Par[IntLoc+20][IntLoc] = Array1Par[IntLoc]
  186. IntGlob = 5
  187. def Func1(CharPar1, CharPar2):
  188. CharLoc1 = CharPar1
  189. CharLoc2 = CharLoc1
  190. if CharLoc2 != CharPar2:
  191. return Ident1
  192. else:
  193. return Ident2
  194. def Func2(StrParI1, StrParI2):
  195. IntLoc = 1
  196. while IntLoc <= 1:
  197. if Func1(StrParI1[IntLoc], StrParI2[IntLoc+1]) == Ident1:
  198. CharLoc = 'A'
  199. IntLoc = IntLoc + 1
  200. if CharLoc >= 'W' and CharLoc <= 'Z':
  201. IntLoc = 7
  202. if CharLoc == 'X':
  203. return TRUE
  204. else:
  205. if StrParI1 > StrParI2:
  206. IntLoc = IntLoc + 7
  207. return TRUE
  208. else:
  209. return FALSE
  210. def Func3(EnumParIn):
  211. EnumLoc = EnumParIn
  212. if EnumLoc == Ident3: return TRUE
  213. return FALSE
  214. if __name__ == '__main__':
  215. import sys
  216. def error(msg):
  217. print >>sys.stderr, msg,
  218. print >>sys.stderr, "usage: %s [number_of_loops]" % sys.argv[0]
  219. sys.exit(100)
  220. nargs = len(sys.argv) - 1
  221. if nargs > 1:
  222. error("%d arguments are too many;" % nargs)
  223. elif nargs == 1:
  224. try: loops = int(sys.argv[1])
  225. except ValueError:
  226. error("Invalid argument %r;" % sys.argv[1])
  227. else:
  228. loops = LOOPS
  229. main(loops)
COPY

以下测试数据均为连续执行3次,取最大值。

0x01 不同Python版本的测试数据

  • Python 2.7

    1. Pystone(1.1) time for 50000 passes = 0.178948
    2. This machine benchmarks at 279411 pystones/second
    COPY
  • Python 3.7

    1. Pystone(1.1) time for 50000 passes = 0.201795
    2. This machine benchmarks at 247777 pystones/second
    COPY
  • Python 3.8

    1. Pystone(1.1) time for 50000 passes = 0.222014
    2. This machine benchmarks at 225211 pystones/second
    COPY
  • Python 3.9

    1. Pystone(1.1) time for 50000 passes = 0.223407
    2. This machine benchmarks at 223807 pystones/second
    COPY
  • Python 3.10

    1. Pystone(1.1) time for 50000 passes = 0.265725
    2. This machine benchmarks at 188164 pystones/second
    COPY
  • Python 3.11

    1. Pystone(1.1) time for 50000 passes = 0.104691
    2. This machine benchmarks at 477596 pystones/second
    COPY

可以看到,Python 3.11版本有了明显的性能提升,这个与官方的宣传也是一致的。

0x02 使用Cython编译python脚本

  1. $ pip install cython
  2. $ cython -3 --embed pystone.py
  3. $ gcc -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python3.7 -l:libpython3.7m.so -o pystone pystone.c
  4. $ ls -l pystone
  5. -rwxrwxrwx 1 drunkdream drunkdream 178928 Sep 6 15:42 pystone
  6. $ readelf -d pystone
  7. Dynamic section at offset 0x1fd08 contains 26 entries:
  8. Tag Type Name/Value
  9. 0x0000000000000001 (NEEDED) Shared library: [libpython3.7m.so.1.0]
  10. 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
  11. 0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
  12. 0x000000000000000c (INIT) 0x403000
  13. 0x000000000000000d (FINI) 0x41b514
  14. 0x0000000000000019 (INIT_ARRAY) 0x420cf8
  15. 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
  16. 0x000000000000001a (FINI_ARRAY) 0x420d00
  17. 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
  18. 0x000000006ffffef5 (GNU_HASH) 0x400308
  19. 0x0000000000000005 (STRTAB) 0x401078
  20. 0x0000000000000006 (SYMTAB) 0x400328
  21. 0x000000000000000a (STRSZ) 2404 (bytes)
  22. 0x000000000000000b (SYMENT) 24 (bytes)
  23. 0x0000000000000015 (DEBUG) 0x0
  24. 0x0000000000000003 (PLTGOT) 0x421000
  25. 0x0000000000000002 (PLTRELSZ) 2592 (bytes)
  26. 0x0000000000000014 (PLTREL) RELA
  27. 0x0000000000000017 (JMPREL) 0x401e30
  28. 0x0000000000000007 (RELA) 0x401b18
  29. 0x0000000000000008 (RELASZ) 792 (bytes)
  30. 0x0000000000000009 (RELAENT) 24 (bytes)
  31. 0x000000006ffffffe (VERNEED) 0x401af8
  32. 0x000000006fffffff (VERNEEDNUM) 1
  33. 0x000000006ffffff0 (VERSYM) 0x4019dc
  34. 0x0000000000000000 (NULL) 0x0
  35. $ ./pystone
  36. Pystone(1.1) time for 50000 passes = 0.171947
  37. This machine benchmarks at 290787 pystones/second
COPY

可以看出,编译成二进制文件后,性能上略有提升,并且需要依赖libpython3.7m.so才能运行。下面是pystone.c文件的部分代码:

  1. /* "pystone.py":73
  2. * return Proc0(loops)
  3. *
  4. * IntGlob = 0 # <<<<<<<<<<<<<<
  5. * BoolGlob = FALSE
  6. * Char1Glob = '\0'
  7. */
  8. if (PyDict_SetItem(__pyx_d, __pyx_n_s_IntGlob, __pyx_int_0) < 0) __PYX_ERR(0, 73, __pyx_L1_error)
  9. /* "pystone.py":74
  10. *
  11. * IntGlob = 0
  12. * BoolGlob = FALSE # <<<<<<<<<<<<<<
  13. * Char1Glob = '\0'
  14. * Char2Glob = '\0'
  15. */
  16. __Pyx_GetModuleGlobalName(__pyx_t_7, __pyx_n_s_FALSE); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 74, __pyx_L1_error)
  17. __Pyx_GOTREF(__pyx_t_7);
  18. if (PyDict_SetItem(__pyx_d, __pyx_n_s_BoolGlob, __pyx_t_7) < 0) __PYX_ERR(0, 74, __pyx_L1_error)
  19. __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
  20. /* "pystone.py":75
  21. * IntGlob = 0
  22. * BoolGlob = FALSE
  23. * Char1Glob = '\0' # <<<<<<<<<<<<<<
  24. * Char2Glob = '\0'
  25. * Array1Glob = [0]*51
  26. */
  27. if (PyDict_SetItem(__pyx_d, __pyx_n_s_Char1Glob, __pyx_kp_u__12) < 0) __PYX_ERR(0, 75, __pyx_L1_error)
  28. /* "pystone.py":76
  29. * BoolGlob = FALSE
  30. * Char1Glob = '\0'
  31. * Char2Glob = '\0' # <<<<<<<<<<<<<<
  32. * Array1Glob = [0]*51
  33. * Array2Glob = list(map(lambda x: x[:], [Array1Glob]*51))
  34. */
  35. if (PyDict_SetItem(__pyx_d, __pyx_n_s_Char2Glob, __pyx_kp_u__12) < 0) __PYX_ERR(0, 76, __pyx_L1_error)
  36. /* "pystone.py":77
  37. * Char1Glob = '\0'
  38. * Char2Glob = '\0'
  39. * Array1Glob = [0]*51 # <<<<<<<<<<<<<<
  40. * Array2Glob = list(map(lambda x: x[:], [Array1Glob]*51))
  41. * PtrGlb = None
  42. */
  43. __pyx_t_7 = PyList_New(1 * 51); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 77, __pyx_L1_error)
  44. __Pyx_GOTREF(__pyx_t_7);
  45. { Py_ssize_t __pyx_temp;
  46. for (__pyx_temp=0; __pyx_temp < 51; __pyx_temp++) {
  47. __Pyx_INCREF(__pyx_int_0);
  48. __Pyx_GIVEREF(__pyx_int_0);
  49. PyList_SET_ITEM(__pyx_t_7, __pyx_temp, __pyx_int_0);
  50. }
  51. }
  52. if (PyDict_SetItem(__pyx_d, __pyx_n_s_Array1Glob, __pyx_t_7) < 0) __PYX_ERR(0, 77, __pyx_L1_error)
  53. __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
COPY

可以看出,C代码本身就已经很难读懂了,编译后的二进制文件基本是不可能还原出原始的python代码的。

不过,目前这种方式有个问题,就是只能编译单个文件。但很多时候,我们是想将多个包编译成一个独立的可执行文件。

0x03 使用Nuitka编译Python脚本

  1. $ pip install nuitka
  2. $ nuitka pystone.py
  3. Nuitka-Options:INFO: Used command line options: pystone.py
  4. Nuitka-Options:WARNING: You did not specify to follow or include anything but main program. Check options
  5. Nuitka-Options:WARNING: and make sure that is intended.
  6. Nuitka:WARNING: Using very slow fallback for ordered sets, please install 'orderedset' PyPI package for best
  7. Nuitka:WARNING: Python compile time performance.
  8. Nuitka:INFO: Starting Python compilation with Nuitka '1.1.8' on Python '3.7' commercial grade 'not installed'.
  9. Nuitka:INFO: Completed Python level compilation and optimization.
  10. Nuitka:INFO: Generating source code for C backend compiler.
  11. Nuitka:INFO: Running data composer tool for optimal constant value handling.
  12. Nuitka:INFO: Running C compilation via Scons.
  13. Nuitka-Scons:INFO: Backend C compiler: gcc (gcc).
  14. Nuitka-Scons:INFO: Backend linking program with 9 files (no progress information available).
  15. Nuitka-Scons:WARNING: You are not using ccache.
  16. Nuitka:INFO: Keeping build directory 'pystone.build'.
  17. Nuitka:INFO: Successfully created 'pystone.bin'.
  18. $ ls -l pystone.bin
  19. -rwxrwxrwx 1 drunkdream drunkdream 268440 Sep 6 20:57 pystone.bin
  20. $ ./pystone.bin
  21. Pystone(1.1) time for 50000 passes = 0.12965
  22. This machine benchmarks at 385654 pystones/second
COPY

可以看到使用nuitka性能上明显比原生的Python要高出许多。

本来想在Python 3.11下测试下性能,不过发现目前最新版本的nuitka还没适配Python 3.11,编译会有报错。

nuitka还有些可选参数,比较重要的有:

  • -o FILENAME: 指定要生成的文件名
  • --standalone: 将依赖库都编译到一个文件中,不过对于依赖的动态链接库,还是会以多个文件的形式存在
  • --onefile: 这个参数可以解决--standalone参数会有多个文件的问题,保证最终生成的是一个可执行文件
  • --nofollow-imports: 不编译import进来的第三方库
  • --clang: 强制使用clang作为编译后端
  • --static-libpython=yes: 静态链接libpython
  • --show-scons: 显示编译C代码过程中的详细日志

通过观察可以发现,nuitka也是通过将python代码转换成C代码,然后编译成最终的可执行文件。使用--static-libpython=yes参数可以静态链接libpython库,使用的命令行如下:

  1. $ gcc -o pystone.bin -fuse-linker-plugin -flto=8 -fpartial-inlining -freorder-functions -O2 -s -z noexecstack -Wl,-R,'/usr/lib' -Wl,--disable-new-dtags -Wl,-b -Wl,binary -Wl,./__constants.bin -Wl,-b -Wl,elf64-x86-64 -Wl,-defsym -Wl,constant_bin_data=_binary_____constants_bin_start @"./@link_input.txt" -L/usr/lib -ldl -lm /usr/lib/libpython3.7m.a
COPY

不过在实际执行时会有报错,原因是命令行中没有包含-lz -lpthread -lexpat -lutil等参数,针对这个问题有一个专门的issue

0x04 结论

相比于py2exepyinstaller等方案,Cython和Nuitka采用了先生成C代码,再进行编译的方案,相对来说安全性和性能上都优于前两种方案。

而Nuitka相比Cython,可以同时编译多个Python脚本,功能上更加强大一些,性能也提升了不少。不过Nuitka使用--onefile参数生成的可执行文件大小会远大于pyinstaller生成的文件大小。

分享
0 comments
Anonymous
Markdown is supported

Be the first guy leaving a comment!