'How can I improve the speed of my Spacy similarity calculation? [closed]

I currently have the following code that handles the similarity calculation between a search and a dictionary of candidates. It takes approximately 13 seconds to get the calculations from 4000 candidates. I did some research that it can be improved by using nlp.pipe(). However, I still don't understand how I can achieve that? Please advise. Below I have my python code.

import os
import sys
from flask import Flask, request, jsonify

import spacy
nlp =  spacy.load("en_core_web_lg")
all_stopwords = nlp.Defaults.stop_words

app = Flask(__name__)

@app.route("/")
def index():
    return "Page does not exist"


@app.route('/calculate-matches', methods=['POST'])
def calculate_matches():
    data = request.get_json()
    candidates = data['candidates'] 
    cur_search = nlp('Looking for someone with experience in building vue frontend applications')

    tmp_search = ''
    for x in cur_search:
        if x.pos_ == "NOUN" or x.pos_ == "PROPN" or x.pos_=="PRON" or x.is_stop==False:
            tmp_search += str(x) + ' '
    cur_search = nlp(tmp_search)

    for member in candidates:
        member_bio = nlp(member['bio']+ ' ' + member['education']+ ' ' + member['experience'])
        
        #calculate similarity
        member['match_score'] = ( cur_search.similarity(member_bio) * 100 )

    #sort canidates' match_score from high to low
    results = sorted(candidates, key=lambda k: k['match_score'], reverse=True)
    return jsonify(results)


if __name__ == "__main__":
    currentdir = os.path.dirname(os.path.realpath(__file__))
    if currentdir not in sys.path:
        sys.path.insert(0, currentdir)
    app.run(host='0.0.0.0', port=5000)


Solution 1:[1]

You can use linear algebra to compute this similarity in a broadcasted manner:

import numpy as np

def cosine_similarity(v, A):
     return np.argmax(np.dot(v, A.T) / (np.linalg.norm(v, ord=2) * np.linalg.norm(A, axis=1, ord=2))

A = np.stack([member.vector for member in member_bio])
v = cur_search.vector
closest_idx = cosine_similarity(v, A)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 erip