'How to fuse multiple Python functions together
I have a number of operations I want to "fuse" together. Let's say there are 3 possible operations:
sq = lambda x: x**2
add = lambda x: x+3
mul = lambda x: x*5
I also have an array of operations:
ops = [add, sq, mul, sq]
I can then create a function from these operations:
def generateF(ops):
def inner(x):
for op in ops:
x = op(x)
return x
return inner
f = generateF(ops)
f(3) # returns 32400
fastF = lambda x: (5*(x+3)**2)**2
f and fastF does the same thing, but fastF is around 1.7-2 times faster than f on my benchmark, which makes sense. My question is, how can I write generateF function that returns a function that is as fast as fastF? The operations are restricted to basic operations like __add__, __mul__, __matmul__, __rrshift__, etc (essentially most numeric operations). generateF can take as long as you'd like, because it will be done before reaching hot code.
The context is that this is a part of my library, so I can define every legal operation, and thus know exactly what they are. The operation definitions are not given to us by the end user randomly (the user can only pick the order of the operations), so we can utilize every outside knowledge about them.
This might seem like premature optimization, but it is not, as f is hot code. Dropping to C is not an option, as the operations can be complex (think, PyTorch tensor multiply), and x can be of any type. Currently, I'm thinking about modifying python's bytecode, but that is very unpleasant, as bytecode specifications changes for every Python version, so I wanted to ask here first before diving into that solution.
Solution 1:[1]
Here's an alternative using chaining. This way, there's only function calls in your generated function calls, no iteration.
def makeF(ops):
f = ops[0]
for op in ops[1:]:
f = lambda x, op=op, f=f: op(f(x))
return f
Bad news: it replaces each function call with two, so it's actually slower than your iterative version. :/
Solution 2:[2]
As there seems to be no solution, this is what I've settled on. Knowing that most operations will be short (<4 operations total), I just hard code in to get rid of the for loop.
def generateF(ops):
l = len(ops)
if l == 1:
return ops[0]
if l == 2:
a, b = ops
return lambda x: b(a(x))
if l == 3:
a, b, c = ops
return lambda x: c(b(a(x)))
if l == 4:
a, b, c, d = ops
return lambda x: d(c(b(a(x))))
def inner(x):
for op in ops:
x = op(x)
return x
return inner
fastF = generateF(ops)
This is only 1.4x slower than fastF (originally 1.7-2x slower). If you have any other ideas, I will consider it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | kindall |
| Solution 2 | 157 239n |
