'Python numpy arrays are trying to broadcast into eachother, but I need them to operate on a new axis instead

I'm writing this code for a class and am pretty new to python, so I wont be surprised if I'm doing a lot of things wrong. I'm trying to compute the intensity of diffraction pattern of light coming through an aperture, which is given by a Bessel function. I've got the Bessel function itself computed relatively efficiently using numpy arrays, but I've run into a bit of a snag when I try to extend the Bessel function on top of a grid for computing the intensity.

Here's my code:

import matplotlib.pyplot as plt
numDivs = 100

rmax = 1e-6
num = 400
lmda = 500e-9
k = 2 * np.pi / lmda

#This function is designed to compute the integral using
# Simpson's Rule. The value N passed in is the number of
# polynomial curve fits to be created. The curve is sliced into
# 2N sections. Designed to work with numpy.
def integrateSimpson(func, lower, upper, N):
    delta = (upper - lower) / (2 * N)
    THETA = np.linspace(lower, upper, 2 * N + 1)
    Y = func(THETA)
    Y = Y * (delta / 3)
    
    return Y[0] + Y[2*N] + 4 * np.sum(Y[1:2 * N:2]) + 2 * np.sum(Y[2:2 * N:2])

#this is the formula for the bessel function
J = lambda m, x: integrateSimpson(lambda theta: np.cos(m * theta - x * np.sin(theta)), 0, np.pi, numDivs)

#this is the formula for the intensity using the above bessel function definition
Intensity = lambda r: (J(1, k * r) / (k * r))**2

#setting up the grids for computation.
xSpace = np.linspace(-rmax, rmax, num)
ySpace = np.linspace(-rmax, rmax, num)

xx, yy = np.meshgrid(xSpace, ySpace)
grid2 = np.zeros((num, num))

#This is where my problem begins
grid2 = Intensity(np.hypot(xx, yy))

#Set up the image and show it
plt.figure(figsize=(14, 14), dpi=80)
plt.set_cmap('plasma')
plt.imshow(grid2, extent = [-rmax , rmax, -rmax, rmax])
plt.colorbar()
plt.show()

Here is the error code I get:

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-f3614ffc9470> in <module>
     21 grid2 = np.zeros((num, num))
     22 
---> 23 grid2 = Intensity(np.hypot(xx, yy))
     24 
     25 #Set up the image and show it

<ipython-input-42-f3614ffc9470> in <lambda>(r)
     11 
     12 #this is the formula for the intensity using the above bessel function definition
---> 13 Intensity = lambda r: (J(1, k * r) / (k * r))**2
     14 
     15 #setting up the grid for computation.

<ipython-input-42-f3614ffc9470> in <lambda>(m, x)
      8 
      9 #this is the formula for the bessel function
---> 10 J = lambda m, x: integrateSimpson(lambda theta: np.cos(m * theta - x * np.sin(theta)), 0, np.pi, numDivs)
     11 
     12 #this is the formula for the intensity using the above bessel function definition

<ipython-input-8-1c73b24d5dd3> in integrateSimpson(func, lower, upper, N)
     19     delta = (upper - lower) / (2 * N)
     20     THETA = np.linspace(lower, upper, 2 * N + 1)
---> 21     Y = func(THETA)
     22     Y = Y * (delta / 3)
     23 

<ipython-input-42-f3614ffc9470> in <lambda>(theta)
      8 
      9 #this is the formula for the bessel function
---> 10 J = lambda m, x: integrateSimpson(lambda theta: np.cos(m * theta - x * np.sin(theta)), 0, np.pi, numDivs)
     11 
     12 #this is the formula for the intensity using the above bessel function definition

ValueError: operands could not be broadcast together with shapes (400,400) (201,)

It looks like the error is that the theta array and the x array are trying to broadcast into each other, but I really don't want them to. Is there a way to set this up which avoid loops but still accomplishes the task? I've fixed the problem for now by backing out of numpy array operations and just using a couple for loops.

The working code which accomplishes what I'm trying to do follows here:

#Type your code here
import matplotlib.pyplot as plt

numDivs = 100

rmax = 1e-6
num = 400
lmda = 500e-9
k = 2 * np.pi / lmda

#This function is designed to compute the integral using
# Simpson's Rule. The value N passed in is the number of
# polynomial curve fits to be created. The curve is sliced into
# 2N sections. Designed to work with numpy.
def integrateSimpson(func, lower, upper, N):
    delta = (upper - lower) / (2 * N)
    THETA = np.linspace(lower, upper, 2 * N + 1)
    Y = func(THETA)
    Y = Y * (delta / 3)
    
    return Y[0] + Y[2 * N] + 4 * np.sum(Y[1:2 * N:2]) + 2 * np.sum(Y[2:2 * N:2])

#this is the formula for the bessel function
J = lambda m, x: integrateSimpson(lambda theta: np.cos(m * theta - x * np.sin(theta)), 0, np.pi, numDivs)

#this is the formula for the intensity using the above bessel function definition
Intensity = lambda r: (J(1, k * r) / (k * r))**2

#setting up the grid for computation.
xSpace = np.linspace(-rmax, rmax, num)
ySpace = np.linspace(-rmax, rmax, num)
grid2 = np.zeros((num, num))

#calculate the diffraction pattern. This is an inefficient way to do it for several reasons
# First I'm not utilizing the efficiency of numpy arrays. I ran into a problem where the
# numpy arrays are trying to broadcast into eachother down the line. Not sure how to fix it so
# I reverted to loops. There's also a level of symmetry where I should only have to compute
# one line along r and then rotate it around, instead of computing the entire grid. 
# computing the grid takes a few seconds at this density.
for i in range(num):
    for j in range(num):
        grid2[i, j] = Intensity(np.hypot(xSpace[i], ySpace[j]))

#Set up the image and show it
plt.figure(figsize=(14, 14), dpi=80)
plt.set_cmap('plasma')
plt.imshow(grid2, extent = [-rmax , rmax, -rmax , rmax])
plt.colorbar()
plt.show()

The produced image for the working code:

$diffraction through aperture$

python numpy

Solution 1:^[1]

Broadcasting is actually your friend here, and you can get rid of a lot of temporary arrays by using it. For example, you can set up the grid like this:

x = np.linspace(-rmax, rmax, num)
y = np.linspace(-rmax, rmax, num)
r = np.hypot(x, y[:, None])
image = intensity(r)

For the sake of clarity, and to avoid recomputing unnecessary quantities, I would recommend not using lambdas in this code. Lambdas have a time and a place, but sacrificing legibility for a couple of lines of code is never worth it. In the end, lambdas are just like normal functions in every way that matters except they don't have a meaningful __name__ attribute:

def intensity(r):
    r_scaled = k * r
    b = bessel(1, r_scaled)
    m = (r_scaled == 0)
    r_scaled[m] = b[m]
    return (b / r_scaled)**2

You want to get an intensity of one when r == 0 to achieve that, I masked out the zeros in the divisor and replaced them with the value of the dividend.

Now to address your actual problem. You have an array r, or in this case r_scaled, that has shape (400, 400). You want to integrate a Bessel function for each element. You are using np.sin and np.cos to generate the elements to integrate. But the problem is that summation is a reduction operation. What dimension are you dropping? The answer is that you need to expand the dimensions of your array. It should become a (400, 400, 201) array, which you then sum back down to (400, 400). Here is an example that uses nested functions to make it a bit easier to read than the mess of lambdas you have.

def bessel(m, x):  # M = 1 (scalar), X = r (400, 400)
    def func(theta):
        return np.cos(m * theta - x[..., None] * np.sin(theta))
    return integrate_simpson(func, 0, np.pi, num_divs)

def integrate_simpson(func, lower, upper, n):
    n *= 2
    delta = (upper - lower) / n
    theta = np.linspace(lower, upper, n + 1)
    y = func(theta)
    y *= delta / 3

    return y[..., 0] + y[..., -1] + 4 * y[..., 1:-1:2].sum(axis=-1) + 2 * y[..., 2:-1:2].sum(axis=-1)

The major change is that bessel now does math with x[..., None] (i.e. r_scaled[..., None]). ... means "take all existing dimensions, no matter how many there are", and None means np.newaxis: add a unit dimension. The result is a (400, 400, 1) view (no data is copied), which you can then broadcast with your 201-element theta to get a (400, 400, 201) array.

Inside integrate_simpson, you use the broadcasted shape to sum along the last axis. y[..., 0] means "take the first image plane". The ... refers to the leading two indices, 0 is the first element for each pixel, which is exactly what you want. y[..., -1] does the same thing. Numpy, like normal python, supports negative indices. If you want the last element, just use -1. It saves you some calculation, and is more intuitive for a python-savvy reader, which will be you in the future. Hopefully by this point, you can extrapolate the concepts to the remaining two terms in the sum. y[..., 1:-1:2].sum(axis=-1) sums along the last axis: the 201 one that you made using broadcasting. The nice thing about using -1 and ... here is that you don't even need to know how many axes you have. This will work perfectly with no modifications required for next Monday, when you decide you really meant to integrate over 5-dimensional images.

A couple of minor nitpicks:

I would make num = 401 to see much nicer steps, or set x = np.linspace(-rmax, rmax, num, endpoint=False) (same for y) for a slightly asymmetrical image with the same nice steps.
Python naming convention is CAPS for constants, CamelCase for classes, snake_case for pretty much everything else, including variables, functions, and methods.

Solution 2:^[2]

Sorry, I just couldn't resist. Let's say you did want to take advantage of the obvious radial symmetry of the problem. That means that you only need to generate the Bessel function for a set of possible radii, and then assign those values to the output image. First find your maximum radius:

r_max_xy = np.hypot(rmax, rmax)

Now generate the integral for just a couple of hundred points across the image. In fact, since you're not doing it for 400 * 400 points, let's splurge and do it for 1000 points:

r_radial = np.linspace(0, r_max_xy, 1000)
image_radial = intensity(r_radial)

Because of how the broadcasting was done in the other answer, intensity(r_radial) will work out of the box. Now you can interpolate the values all over the image. For a first-order approximation, let's just do nearest-neighbor interpolation.

The goal is to compute the index in image_radial to take the image value from. Since r_radial is evenly spaced, you can use simple rounding arithmetic:

r_index = (r / (r_radial.ptp() / (r_radial.size - 1)) + 0.5).astype(int)
image = image_radial[r_index]

For a smoother/more accurate interpolation, (a) use more points, and/or (b) add use a better interpolator. Here is an example of the latter:

from scipy.interpolate import Akima1DInterpolator

interp = Akima1DInterpolator(r_radial, image_radial)
image = interp(r)

Solution 3:^[3]

One more totally distinct approach that is a hybrid of interpolation and brute force is to sample one octant of the image, which will contain all the possible radii for the entire image. This is not interpolation, but it is reducing the number of computed integrals to the bare minimum. You can combine it with interpolation to avoid calling the interpolator 7/8 times.

You can do this by taking a quarter of the field, and applying np.triu_indices to it. The best part is that you don't need to generate the image to get the radii. You will need to process even and odd num a little bit differently for a robust solution.

Let's start with num even:

xy = np.linspace(-rmax, -rmax / (num - 1), num // 2)
row, col = np.triu_indices(xy.size)
r_octant = np.hypot(xy[row], xy[col])

The rest of the process is straightforward. The trick is getting the data back into the full image:

octant = intensity(r_octant)
image = np.empty((num, num), float)
# Reflect octant about diagonal
image[row, col] = image[col, row] = octant
# Reflect quadrant about vertical
image[:xy.size, xy.size:] = image[:xy.size, xy.size - 1::-1]
# Reflect half about horizontal
image[xy.size:] = image[xy.size - 1::-1]

Handling the odd size is slightly different, because there are a couple of off-by-one adjustments:

xy = np.linspace(-rmax, 0, (num + 1) // 2)
...
# Reflect quadrant about vertical
image[:xy.size, xy.size:] = image[:xy.size, xy.size - 2::-1]
# Reflect half about horizontal
image[xy.size:] = image[xy.size - 2::-1]

Benchmarks

The marginal value of new solutions is not much, given how many I've posted already, so let's take a look at some benchmarks. More than just speed, you will want to compare the tradeoff between speed and accuracy.

Here are the three solutions:

def full_vectorized(rmax, num):
    x = np.linspace(-rmax, rmax, num)
    y = np.linspace(-rmax, rmax, num)
    r = np.hypot(x, y[:, None])
    return intensity(r)

def octant_vectorized(rmax, num):
    if num % 2:
        n = (num + 1) // 2
        k = n - 2
        s = 0
    else:
        n = num // 2
        k = n - 1
        s = -rmax / (num - 1)
    xy = np.linspace(-rmax, s, n)
    row, col = np.triu_indices(xy.size)
    r_octant = np.hypot(xy[row], xy[col])
    octant = intensity(r_octant)
    image = np.empty((num, num), float)
    image[row, col] = image[col, row] = octant
    image[:xy.size, xy.size:] = image[:xy.size, k::-1]
    # Reflect half about horizontal
    image[xy.size:] = image[k::-1]
    return image

def nn_interpolated(rmax, num, n=100):
    x = np.linspace(-rmax, rmax, num)
    y = np.linspace(-rmax, rmax, num)
    r = np.hypot(x, y[:, None])
    r_max_xy = r[0, 0]
    r_radial = np.linspace(0, r_max_xy, n)
    image_radial = intensity(r_radial)
    r_index = ((r - r_radial[0]) / (r_radial.ptp() / (r_radial.size - 1)) + 0.5).astype(int)
    return image_radial[r_index]

def akima_interpolated(rmax, num, n=100):
    r_max_xy = np.hypot(rmax, rmax)
    r_radial = np.linspace(0.5 * rmax / num, r_max_xy, n)
    image_radial = intensity(r_radial)
    x = np.linspace(-rmax, rmax, num)
    y = np.linspace(-rmax, rmax, num)
    r = np.hypot(x, y[:, None])
    return Akima1DInterpolator(r_radial, image_radial)(r)

First a quick check that all the methods return similar results:

>>> img_fv = full_vectorized(rmax, num)
>>> img_ov = octant_vectorized(rmax, num)
>>> np.abs(img_fv - img_ov).max()
4.218847493575595e-15   ## This is OK

If you plt.imshow(img_fv - img_ov), you will see roundoff artifacts from the fact that np.linspace creates slightly different values on the positive half of the range than it does on the negative. It is totally safe to ignore these errors (see figure below, chart (a)).

Taking img_fv as the gold standard, we can compute the RMS of the other methods about it and plot the results (chart (b) below).

>>> from haggis.math import rms
>>> img_ni_100 = nn_interpolated(rmax, num, n=100)
>>> img_ni_1000 = nn_interpolated(rmax, num, n=1000)
>>> img_ai_100 = akima_interpolated(rmax, num, n=100)
>>> img_ai_1000 = akima_interpolated(rmax, num, n=1000)
>>> print('NN 100', rms(img_fv - img_ni_100))
>>> print('NN 1000', rms(img_fv - img_ni_1000))
>>> print('AI 100', rms(img_fv - img_ai_100))
>>> print('AI 1000', rms(img_fv - img_ai_1000))
NN 100 0.009473741009264758
NN 1000 0.0009418991442352894
AI 100 2.9400011847429186e-05
AI 1000 2.479026973749244e-08

As expected, better interpolation methods and more points yield better answers.

Now for the benchmarks:

%timeit full_vectorized(rmax, num)
343 ms ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit octant_vectorized(rmax, num)
40.4 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit nn_interpolated(rmax, num, 100)
1.63 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit nn_interpolated(rmax, num, 1000)
2.38 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit akima_interpolated(rmax, num, 100)
3.07 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit akima_interpolated(rmax, num, 1000)
5.75 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As you can see, any of the interpolated solutions is ~10x better than the octant solution and ~100x better than the full solution. Given that the RMS error given by the 1000-point Akima interpolator is on the order of roundoff error, you can in good conscience settle for that solution.

In fact, you can probably make it even faster:

def octant_akima_interpolated(rmax, num, n=100):
    if num % 2:
        m = (num + 1) // 2
        k = m - 2
        s = 0
    else:
        m = num // 2
        k = m - 1
        s = -rmax / (num - 1)
    xy = np.linspace(-rmax, s, m)
    row, col = np.triu_indices(xy.size)
    r_octant = np.hypot(xy[row], xy[col])
    r_max_xy = np.hypot(rmax, rmax)
    r_radial = np.linspace(0.5 * rmax / num, r_max_xy, n)
    image_radial = intensity(r_radial)
    octant = Akima1DInterpolator(r_radial, image_radial)(r_octant)
    image = np.empty((num, num), float)
    image[row, col] = image[col, row] = octant
    image[:xy.size, xy.size:] = image[:xy.size, k::-1]
    # Reflect half about horizontal
    image[xy.size:] = image[k::-1]
    return image

>>> img_oi_100 = octant_akima_interpolated(rmax, num, n=100)
>>> img_oi_1000 = octant_akima_interpolated(rmax, num, n=1000)
>>> print('OI 100', rms(img_fv - img_oi_100))
>>> print('OI 1000', rms(img_fv - img_oi_1000))
OI 100 2.940001184744199e-05
OI 1000 2.4790269736396652e-08

%timeit octant_akima_interpolated(rmax, num, 100)
936 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit octant_akima_interpolated(rmax, num, 1000)
1.87 ms ± 55.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So there you go: the speeds of the fastest linear interpolator across the full array, with virtually no meaningful precision loss (you can always increase to 10000 points if 2.5e-8 bothers you). You can improve the timing of the accepted answer by ~200x, as advertised.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2
Solution 3	Mad Physicist

'Python numpy arrays are trying to broadcast into eachother, but I need them to operate on a new axis instead

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]