The fastest way for Python to implement loops (speed comparison between for and while)

As we all know, Python is not a highly efficient language. In addition, loops are a very time-consuming operation in any language. If any simple one-step operation takes one unit of time, repeat the operation for tens of thousands of times, and the final time will increase by tens of thousands of times.

while and for are two commonly used keywords to implement loops in Python. In fact, there is a gap in their running efficiency. For example, the following test code:

import timeit


def while_loop(n=100_000_000):
    i = 0
    s = 0
    while i < n:
        s += i
        i += 1
    return s


def for_loop(n=100_000_000):
    s = 0
    for i in range(n):
        s += i
    return s


def main():
    print('while loop\t\t', timeit.timeit(while_loop, number=1))
    print('for loop\t\t', timeit.timeit(for_loop, number=1))


if __name__ == '__main__':
    main()
# => while loop               4.718853999860585
# => for loop                 3.211570399813354

This is a simple summation operation, which calculates the sum of all natural numbers from 1 to n. You can see that the for loop is 1.5 seconds faster than the while loop.

The main difference lies in the different mechanisms between the two.

In each loop, while actually performs two more steps than for: boundary checking and autoincrement of variable i. That is, for each cycle, while will perform a boundary check (while i < n) and auto increment calculation (i +=1). Both steps are explicit pure Python code.

The for loop does not need to perform boundary checking and self increment operations, and does not add explicit Python code (pure Python code is less efficient than the underlying C code). When the number of cycles is enough, there is a significant efficiency gap.

Two more functions can be added to the for loop, including unnecessary boundary checking and self increment calculation:

import timeit


def while_loop(n=100_000_000):
    i = 0
    s = 0
    while i < n:
        s += i
        i += 1
    return s


def for_loop(n=100_000_000):
    s = 0
    for i in range(n):
        s += i
    return s


def for_loop_with_inc(n=100_000_000):
    s = 0
    for i in range(n):
        s += i
        i += 1
    return s


def for_loop_with_test(n=100_000_000):
    s = 0
    for i in range(n):
        if i < n:
            pass
        s += i
    return s


def main():
    print('while loop\t\t', timeit.timeit(while_loop, number=1))
    print('for loop\t\t', timeit.timeit(for_loop, number=1))
    print('for loop with increment\t\t',
          timeit.timeit(for_loop_with_inc, number=1))
    print('for loop with test\t\t', timeit.timeit(for_loop_with_test, number=1))


if __name__ == '__main__':
    main()
# => while loop               4.718853999860585
# => for loop                 3.211570399813354
# => for loop with increment          4.602369500091299
# => for loop with test               4.18337869993411

It can be seen that the increased boundary checking and self increment operations do greatly affect the execution efficiency of the for loop.

As mentioned earlier, Python's underlying interpreter and built-in functions are implemented in C language. The execution efficiency of C language is much higher than that of Python.

For the above operation of summing the arithmetic sequence, with the help of Python's built-in sum function, the execution efficiency is much higher than that of the for or while loop.

import timeit


def while_loop(n=100_000_000):
    i = 0
    s = 0
    while i < n:
        s += i
        i += 1
    return s


def for_loop(n=100_000_000):
    s = 0
    for i in range(n):
        s += i
    return s


def sum_range(n=100_000_000):
    return sum(range(n))


def main():
    print('while loop\t\t', timeit.timeit(while_loop, number=1))
    print('for loop\t\t', timeit.timeit(for_loop, number=1))
    print('sum range\t\t', timeit.timeit(sum_range, number=1))


if __name__ == '__main__':
    main()
# => while loop               4.718853999860585
# => for loop                 3.211570399813354
# => sum range                0.8658821999561042

It can be seen that after using the built-in function sum to replace the loop, the execution efficiency of the code has doubled.

The accumulation operation of the built-in function sum is actually a loop, but it is implemented by C language, while the summation operation in the for loop is implemented by pure Python code s += i. C > Python.

Expand your thinking. As a child, I heard the story of Gauss skillfully calculating the sum of 1 to 100 in childhood. The sum of 1... 100 is equal to (1 + 100) * 50. This calculation method can also be applied to the above summation operation.

import timeit


def while_loop(n=100_000_000):
    i = 0
    s = 0
    while i < n:
        s += i
        i += 1
    return s


def for_loop(n=100_000_000):
    s = 0
    for i in range(n):
        s += i
    return s


def sum_range(n=100_000_000):
    return sum(range(n))


def math_sum(n=100_000_000):
    return (n * (n - 1)) // 2


def main():
    print('while loop\t\t', timeit.timeit(while_loop, number=1))
    print('for loop\t\t', timeit.timeit(for_loop, number=1))
    print('sum range\t\t', timeit.timeit(sum_range, number=1))
    print('math sum\t\t', timeit.timeit(math_sum, number=1))


if __name__ == '__main__':
    main()
# => while loop               4.718853999860585
# => for loop                 3.211570399813354
# => sum range                0.8658821999561042
# => math sum                 2.400018274784088e-06

Finally, the execution time of math sum is about 2.4e-6, which is shortened by millions of times. The idea here is that since the efficiency of the loop is low, a piece of code needs to be repeated hundreds of millions of times.

Simply do not cycle directly. Through mathematical formulas, turn hundreds of millions of cycle operations into only one-step operations. Efficiency has naturally been unprecedentedly strengthened.

Final conclusion (a little Riddler):

The fastest way to implement a loop is not to use a loop

For Python, use built-in functions as much as possible to minimize pure Python code in the loop.

Of course, built-in functions are not the fastest in some cases. For example, when creating a list, the speed of literal writing is faster.

Keywords: Python

Added by Bjom on Wed, 05 Jan 2022 11:05:24 +0200