'Show NER Spacy Data in dataframe
I am doing some web scraping to export text info from an html and using a NER (Spacy) to identify information such as Assets Under Management, Addresses, and founding dates of companies. Once the information is extracted, I would like to place it in a dataframe.
I am working with the following script:
from bs4 import BeautifulSoup
import numpy as np
from time import sleep
from random import randint
from selenium import webdriver
import pandas as pd
import spacy
from spacy import displacy
import en_core_web_sm
import requests
import re
NER = spacy.load("en_core_web_sm")
url = "https://www.baincapital.com/"
driver = webdriver.Chrome("C:/Program Files/chromedriver.exe")
driver.get(url)
sleep(randint(5,15))
soup = BeautifulSoup(driver.page_source, 'html.parser')
body=soup.body.text
body
body= body.replace('\n', ' ')
body= body.replace('\t', ' ')
body= body.replace('\r', ' ')
body= body.replace('\xa0', ' ')
text3= NER(body)
displacy.render(text3,style="ent",jupyter=True)
The output is shown as:
And I would like to place it in the following rudimentary table:
| Entity | Identified |
|---|---|
| Money | $155 Billion |
| Date | 1984 |
| Org | Bain Capital |
| Org | Bain Capital Investor Portal Please |
| Cardinal | four |
| Cardinal | 24 |
| GPE | US |
Essentially, take highlighted info and place it in a dataframe with identifying features.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

