July 16, 2017

Comparing different ways to add ints II

Background

As outlined in part 1, we are comparing recursive and iterative implementations of long arithmetic of 2 arrays. As seen in the chart at the bottom of the post, there is a sudden jump in execution times of the recursive function, when the length of array becomes ~900 ints. Using the cProfile and line_profile modules in Python, this increase in execution times is investigated in this post. First cProfile was used to record and examine the execution times for recur_sum and iter_sum. After the bottlenecks were located, line_profiler was used to profile each function with higher granularity. To guarantee consistent analysis both implementations will be benchmarked with line profiling enabled.

Refactoring

The script from part 1 had to be changed to profile the relevant function calls and save the results. This effectively wraps the function calls with cProfiler and then saves the results into a .dmp type. Afterwards results are plotted.

Plot results

Instead of using a scatter plot, normal plot was used, which made the increase in gradient more obvious. It also takes the right and left limits to the x values (lengths of the array). Setting xlim to the plot will help us focus the picture. We will not exceed recursion depth in this example. Hence we won’t be marking the last point in recursive times.

def plot_results_for_cprofile(xs, ys, r_limit, l_limit):
    plt.xlim(r_limit, l_limit)
    plt.xlabel('Size of input arrays')
    plt.ylabel('Time to calculate sum (ms)')
    plt.title('Comparing iterative vs recursive sum methods')

    iter_arr_lengths, recur_arr_lengths = xs
    iters, recurs = ys

    # plot iterative times
    plt.plot(np.array(iter_arr_lengths), np.array(iters), c="red")

    # plot recursive times
    plt.plot(np.array(recur_arr_lengths), np.array(recurs), c="green")

    fname = "plot_sum_ints.png"
    plt.savefig(fname, dpi=1200)
    print("Saved plot as {}".format(fname))

Profile run

Profile run method instantiates 2 cProfile.Profile() classes for iterative and recursive solution profiling. Looping over different array lengths, the iter_sum and recur_sum solutions are profiled. The profiling information is saved to disk using the dump_stats() method, which takes a string for argument name.

def profile_run(r_limit, l_limit):
    import cProfile
    iter_arr_lengths = []
    recur_arr_lengths = []
    iter_times = []
    recur_times = []

    for arr_length in range(r_limit, l_limit, 1):
        arr1 = [9 for _ in range(arr_length)]
        arr2 = [1 for _ in range(arr_length)]

        pr = cProfile.Profile()
        pr2 = cProfile.Profile()

        pr.enable()
        res_iter, time_iter = iter_sum(arr1, arr2)
        pr.disable()
        pr.dump_stats("iter_{}".format(str(arr_length)))
        iter_arr_lengths.append(arr_length)
        iter_times.append(time_iter)

        pr2.enable()
        res_recur, time_recur = recur_sum(arr1, arr2)
        pr2.disable()
        pr2.dump_stats("recur_{}".format(str(arr_length)))
        recur_arr_lengths.append(arr_length)
        recur_times.append(time_recur)

    plot_results_for_cprofile([iter_arr_lengths, recur_arr_lengths],
                              [iter_times, recur_times], r_limit, l_limit)

Dumping stats

Used the length of input arrays as id of this profiler run. Looking at the source for cProfile.Profile.dump_stats()

    def dump_stats(self, file):
        import marshal
        with open(file, 'wb') as f:
            self.create_stats()
            marshal.dump(self.stats, f)

We see that it uses the marshal module. From the documentation, we know that:

Details of the format are undocumented on purpose; it may change between Python versions (although it rarely does).

This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules :mod:pickle and :mod:shelve. The :mod:marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of :file:.pyc files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use the :mod:pickle module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal.

The :mod:marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source.

What joy! cProfile (the builtin python profiler) uses a badly documented, backwards-incompatible, insecure module with incomplete support for Python types.

Line profiling

Line profiling was implemented by changing the code for the timeit decorator function. This allowed repeat measurements to stay consistent between recursive and iterative methods. Assuming that the overhead of line profiling of the recursive and iterative solution is in the same order of magnitude, wrapping both recur_sum and iter_sum wasn’t expected to change the difference between them too much. recur_sum_helper and iter_sum_helper had to be moved above timeit in the source code (reasons below).

def timeit(func):
    str_func_to_profile = func.__code__.co_names[0]
    func_to_prof = globals()[str_func_to_profile]

    def newfunc(*args, **kwargs):
        # locals() returns a dictionary, where 'args' is key for local vars
        # ASSUMPTION: both input arrays have the same length, use either
        arr_length = len(locals()['args'][0])
        # make a new instance of LineProfiler for each time iter_sum() is
        # called with new input arrays
        line_prof = LineProfiler()
        func_prof = line_prof(func)
        line_prof.add_function(func_to_prof)

        startTime = time.time()
        res = func_prof(*args, **kwargs)
        elapsedTime = time.time() - startTime
        line_prof.dump_stats("ll_{}_{}".format(
            str_func_to_profile, str(arr_length)))
        time_as_string = '{:.6f}'.format(elapsedTime * 1000)
        return (res, time_as_string)

    return newfunc

Below is a quick introduction to python decorators and how the timeit function had to change to include line profiling.

Decorators are evaluated at runtime

When a python module is loaded or runs, the wrappers are evaluated, as soon as they are encountered. Regardless if the wrapped function is even called anywhere in the module, the decorator processes and returns the new function, as soon as it is given a function to wrape. When the wrapped function is called, it has already been modified, so the function being executed is function that the decorator returned, when it was evaluated earler. To prove that decorators work at wrap-time, not call-time, set breakpoits pdb.set_trace() as below.

import pdb
pdb.set_trace() # breakpoint 1

@timeit
def iter_sum(arr1, arr2):
    r_arr1 = arr1[::-1]
    r_arr2 = arr2[::-1]
    return iter_sum_helper(r_arr1, r_arr2)[::-1]

pdb.set_trace() # breakpoint 2

When breakpoint 1 is hit, iter_sum hasn’t been defined, so the Python interpreter throws a NameError.

$$$$ python3 sum_ints_as_arrays.py 
> /home/petr_tik/Coding/misc/misc/sum_ints_profile/sum_ints_as_arrays.py(115)<module>()
-> @timeit
(Pdb) iter_sum.__name__
*** NameError: name 'iter_sum' is not defined

(c)ontinuing to breakpoint 2, now iter_sum has been defined and wrapped. newfunc is the name that came from the closure of timeit. This proves that iter_sum has already been wrapped by timeit, before it’s first called.

(Pdb) c
> /home/petr_tik/Coding/misc/misc/sum_ints_profile/sum_ints_as_arrays.py(124)<module>()
-> @timeit
(Pdb) iter_sum.__name__
'newfunc'

Closures

Decorators in Python rely on the concept of closure. At the time when newfunc is defined inside the timeit function, there are 3 variables available to newfunc: str_func_to_profile, func_to_prof and func (wrapped function) itself. Continuing the same pdb session, freevars inside the code object of iter_sum were examined.

(Pdb) iter_sum.__code__.co_freevars
('func', 'func_to_prof', 'str_func_to_profile')

Adding line-profiling to the decorator

Using the above and some inspect hackery, we access the code object of func using __code__. co_names returns a tuple of names of local variables. Both input funcs - iter_sum or recur_sum only have 1 local variable - its helper function. iter_sum_helper and recur_sum_helper are above this wrapper function in the source code. This guarantees that they will be found when globals is called inside the decorator. Using the string as the key, we retrieve the function object from the globals() dictionary.

Originally, I made a mistake by instantiating the LineProfiler class inside timeit, but before newfunc is defined. This made the same instance of the LineProfiler object availabe to all calls of iter_sum or recur_sum respectively. This would append profiling results into the same file, which was later dumped under a new name. Profiling data for recur_sum_919 included the profiling data for all previous runs, which made it incorrect. Instead, each time a function is called, a new instance of the LineProfiler is created, when the input func (iter_sum or recur_sum) is called.

Overall running times are still collected and returned and they are expected to increase with the overhead of line profiling. The line profiler dumps stats into a file called by its function name and array length. The result and elapsed time are returned, used by profile_run to collect and plot time for each function call.

Results

Plot

Photo

In the graph, green is still recursive, red is iterative times. There are several notables differences - times are higher overall - profiling has a noticeable overhead. The difference between iterative and recursive is still present and the shape of ups/downs is similar in both lines . There is a huge spike for the recursive solution when the input arrays are of size 907. This makes it easy to investigate function calls.

Reading the cProfile dump

Read function

All instances of cProfile for each array length dump_stats into plaintext files recur_{arr_length} or iter_{arr_length}. read_dump.py is defined to print the dump to terminal to investigate, grep and read through it.

import pstats
import sys


def main():
    if len(sys.argv) != 2:
        print("Supply a filename")
        return
    fname = sys.argv[1]

    stats = pstats.Stats(fname)
    stats.sort_stats("time").print_stats(1.0)

if __name__ == "__main__":
    main()

Results

Using this script, we examine the dumps of recursive calls with arrays of lengths 917 and 918 (before, during and after the peak).

$$$$ ./read_dump.py recur_917 | grep seconds
         5542 function calls (4625 primitive calls) in 0.011 seconds
$$$$ ./read_dump.py recur_918 | grep seconds
         5548 function calls (4630 primitive calls) in 0.039 seconds
$$$$ ./read_dump.py recur_919 | grep seconds
         5554 function calls (4635 primitive calls) in 0.011 seconds

At the peak, the number of function calls increases by 5, but the execution time more than triples.

petr_tik@merluza:~/Coding/misc/misc/sum_ints_profile$ ./read_dump.py recur_917 && ./read_dump.py recur_918 && ./read_dump.py recur_919
Sun Jul 23 19:18:39 2017    recur_917
         5542 function calls (4625 primitive calls) in 0.011 seconds
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    918/1    0.010    0.000    0.011    0.011 sum_ints_as_arrays.py:52(recur_sum_helper)

Sun Jul 23 19:18:39 2017    recur_918
         5548 function calls (4630 primitive calls) in 0.039 seconds
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    919/1    0.038    0.000    0.038    0.038 sum_ints_as_arrays.py:52(recur_sum_helper)

Sun Jul 23 19:18:39 2017    recur_919
         5554 function calls (4635 primitive calls) in 0.011 seconds
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    920/1    0.010    0.000    0.011    0.011 sum_ints_as_arrays.py:52(recur_sum_helper)

In all three (and we can assume other) instances of recur_sum, nearly 100% of time is spent on recursive calls of recur_sum_helper. We will need to use line_profiling to get more detail about the increase in execution times.

Reading line profiling dump

Read function

The line_profiler module has separate function for loading stats and showing text of the stats object. For serialisation, the line profiler module uses the builtin pickle module, which has wider coverage and better support.

#! /usr/bin/env python3

import sys

from line_profiler import load_stats, show_text

def main():
    if len(sys.argv) != 2:
        print("Supply a filename")
        return
    fname = sys.argv[1]

    stats = load_stats(fname)
    show_text(stats.timings, stats.unit)

if __name__ == "__main__":
    main()

Results

$$$$ ./read_ll.py ll_recur_sum_helper_917
Timer unit: 1e-06 s

Total time: 0.006587 s
File: sum_ints_as_arrays.py
Function: recur_sum_helper at line 52

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    52                                           def recur_sum_helper(arr1, arr2, idx=0, res=[], carry=0):
    53       918          481      0.5      7.3      if not res:
    54         1            2      2.0      0.0          res = list()
    55       918          926      1.0     14.1      if idx >= len(arr1) and idx >= len(arr2):
    56         1            1      1.0      0.0          if carry == 1:
    57         1            1      1.0      0.0              res.append(1)
    58         1            1      1.0      0.0          return res
    59       917          797      0.9     12.1      if idx >= len(arr1):
    60                                                   carry, item_to_add = divmod(carry + arr2[idx], 10)
    61                                                   res.append(item_to_add)
    62                                                   idx += 1
    63                                                   return recur_sum_helper(arr1, arr2, idx, res, carry)
    64       917          782      0.9     11.9      if idx >= len(arr2):
    65                                                   carry, item_to_add = divmod(carry + arr1[idx], 10)
    66                                                   res.append(item_to_add)
    67                                                   idx += 1
    68                                                   return recur_sum_helper(arr1, arr2, idx, res, carry)
    69                                           
    70       917         1109      1.2     16.8      carry, item_to_add = divmod(carry + arr1[idx] + arr2[idx], 10)
    71       917          842      0.9     12.8      res.append(item_to_add)
    72       917          564      0.6      8.6      idx += 1
    73       917         1081      1.2     16.4      return recur_sum_helper(arr1, arr2, idx, res, carry)

Total time: 0.010676 s
File: sum_ints_as_arrays.py
Function: recur_sum at line 119

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   119                                           @timeit
   120                                           def recur_sum(arr1, arr2):
   121         1           11     11.0      0.1      r_arr1 = arr1[::-1]
   122         1            7      7.0      0.1      r_arr2 = arr2[::-1]
   123         1        10658  10658.0     99.8      return recur_sum_helper(r_arr1, r_arr2)[::-1]

$$$$ ./read_ll.py ll_recur_sum_helper_918
Timer unit: 1e-06 s

Total time: 0.034118 s
File: sum_ints_as_arrays.py
Function: recur_sum_helper at line 52

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    52                                           def recur_sum_helper(arr1, arr2, idx=0, res=[], carry=0):
    53       919          472      0.5      1.4      if not res:
    54         1            3      3.0      0.0          res = list()
    55       919          892      1.0      2.6      if idx >= len(arr1) and idx >= len(arr2):
    56         1            0      0.0      0.0          if carry == 1:
    57         1            1      1.0      0.0              res.append(1)
    58         1            1      1.0      0.0          return res
    59       918          763      0.8      2.2      if idx >= len(arr1):
    60                                                   carry, item_to_add = divmod(carry + arr2[idx], 10)
    61                                                   res.append(item_to_add)
    62                                                   idx += 1
    63                                                   return recur_sum_helper(arr1, arr2, idx, res, carry)
    64       918          776      0.8      2.3      if idx >= len(arr2):
    65                                                   carry, item_to_add = divmod(carry + arr1[idx], 10)
    66                                                   res.append(item_to_add)
    67                                                   idx += 1
    68                                                   return recur_sum_helper(arr1, arr2, idx, res, carry)
    69                                           
    70       918         1062      1.2      3.1      carry, item_to_add = divmod(carry + arr1[idx] + arr2[idx], 10)
    71       918          814      0.9      2.4      res.append(item_to_add)
    72       918          542      0.6      1.6      idx += 1
    73       918        28792     31.4     84.4      return recur_sum_helper(arr1, arr2, idx, res, carry)

Total time: 0.038265 s
File: sum_ints_as_arrays.py
Function: recur_sum at line 119

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   119                                           @timeit
   120                                           def recur_sum(arr1, arr2):
   121         1           11     11.0      0.0      r_arr1 = arr1[::-1]
   122         1            6      6.0      0.0      r_arr2 = arr2[::-1]
   123         1        38248  38248.0    100.0      return recur_sum_helper(r_arr1, r_arr2)[::-1]

$$$$ ./read_ll.py ll_recur_sum_helper_919
Timer unit: 1e-06 s

Total time: 0.006544 s
File: sum_ints_as_arrays.py
Function: recur_sum_helper at line 52

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    52                                           def recur_sum_helper(arr1, arr2, idx=0, res=[], carry=0):
    53       920          495      0.5      7.6      if not res:
    54         1            2      2.0      0.0          res = list()
    55       920          936      1.0     14.3      if idx >= len(arr1) and idx >= len(arr2):
    56         1            0      0.0      0.0          if carry == 1:
    57         1            1      1.0      0.0              res.append(1)
    58         1            0      0.0      0.0          return res
    59       919          788      0.9     12.0      if idx >= len(arr1):
    60                                                   carry, item_to_add = divmod(carry + arr2[idx], 10)
    61                                                   res.append(item_to_add)
    62                                                   idx += 1
    63                                                   return recur_sum_helper(arr1, arr2, idx, res, carry)
    64       919          785      0.9     12.0      if idx >= len(arr2):
    65                                                   carry, item_to_add = divmod(carry + arr1[idx], 10)
    66                                                   res.append(item_to_add)
    67                                                   idx += 1
    68                                                   return recur_sum_helper(arr1, arr2, idx, res, carry)
    69                                           
    70       919         1107      1.2     16.9      carry, item_to_add = divmod(carry + arr1[idx] + arr2[idx], 10)
    71       919          827      0.9     12.6      res.append(item_to_add)
    72       919          532      0.6      8.1      idx += 1
    73       919         1071      1.2     16.4      return recur_sum_helper(arr1, arr2, idx, res, carry)

Total time: 0.010754 s
File: sum_ints_as_arrays.py
Function: recur_sum at line 119

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   119                                           @timeit
   120                                           def recur_sum(arr1, arr2):
   121         1           11     11.0      0.1      r_arr1 = arr1[::-1]
   122         1            6      6.0      0.1      r_arr2 = arr2[::-1]
   123         1        10737  10737.0     99.8      return recur_sum_helper(r_arr1, r_arr2)[::-1]

Conclusion

This post showed the use of cProfile and line_profiler modules implemented as wrappers to collect and review profiling information. Decorators, which rely on the concept of closures and runtime code inspection, were explained and used for implementing profiling. The spike in execution times of the recursive solution was found using line_profiling. Given arrays of length 918, the recursive sum function for some unexplained reasons takes 28x more time to run than previous and consequent function calls.

Future work

At this point, it’s best to implement tracing either inside python interpreter or from outside the process. By tracing and logging each recursively created and executed stack frame inside recur_sum_helper, we can get more information about the slowdown. Another avenue for investigation can be looking into CPython source code. Methods like list.append() might cause reallocation, which would slow down the execution at some stack frames. The line profiling results suggest that would be unlikely, unless they fail to time that.