'What is efficient way to index the mongoDB collection out of compound indexes and creating custom index using compound index fields
I have a notifications collection.
Before creating a new notification related to certain update I need to check for the notification that user hasn't seen yet, So I can update the existing notification rather than creating a new one.
Schema -
const notificationSchema = new mongoose.Schema<Notification>(
{
__id: Number,
user: {
name: String,
type: {
type: String,
},
ref: mongoose.Types.ObjectId,
},
topic: String,
original: Object,
updated: Object,
actionDate: {
type: Date,
},
needsApproval: Boolean,
status: {
type: String,
enum: ['pending', 'accepted', 'rejected', 'expired'],
},
seen: Boolean,
},
{timestamps: true},
)
So, The notifications will be uniquely identified by three fields -
topic, user.ref and seen.
- I can create a compound index for above fields in the database.
- I can create a
uidbased ontopic,user.refand keep it on the notification and index based on it as follow -
uid: topic + user.ref
So which will be the good way??
Solution 1:[1]
The problem here is that you are trying to access a NoneType variable. next_page = soup.select_one('a.next.ajax-page', href=True) return nothing so you cant access ['href'] inside
Solution 2:[2]
What happens?
Your selection soup.find('a.next.ajax-page', href=True) is not finding the element you are searching for in any way cause it is a mix of syntaxes (find and css selectors) and will always return None - So it also won't be able accessing the attribute value.
How to fix?
Change your line checking the next page element from:
if soup.find('a.next.ajax-page', href=True) == None:
to:
if soup.find('a',{'class':'next ajax-page'}) == None:
or
if soup.select_one('a.next.ajax-page') == None:
You also should be able to scrape all basic information of the search results and store these in one step instead of returning a list of urls for search pages:
def page_parse(url):
data = []
while True:
page = requests.get(url)
soup = BeautifulSoup(page.text)
for item in soup.select('div.result'):
data.append({
'title':item.h2.text,
'url':f"{baseUrl}{item.a['href']}"
})
if (url := soup.select_one('a.next.ajax-page')):
url = f"{baseUrl}{url['href']}"
else:
return data
Example
import requests
from bs4 import BeautifulSoup
baseUrl = 'http://www.yellowpages.com'
def page_parse(url):
data = []
while True:
page = requests.get(url)
soup = BeautifulSoup(page.text)
for item in soup.select('div.result'):
data.append({
'title':item.h2.text,
'url':f"{baseUrl}{item.a['href']}"
})
if (url := soup.select_one('a.next.ajax-page')):
url = f"{baseUrl}{url['href']}"
else:
return data
page_parse('http://www.yellowpages.com/omaha-ne/towing')
Output
[{'title': "1. Keith's BP",
'url': 'http://www.yellowpages.com/omaha-ne/mip/keiths-bp-460502890?lid=1002059325385'},
{'title': '2. Neff Towing Svc',
'url': 'http://www.yellowpages.com/omaha-ne/mip/neff-towing-svc-21969600?lid=1000282974083#gallery'},
{'title': '3. A & A Towing',
'url': 'http://www.yellowpages.com/omaha-ne/mip/a-a-towing-505777665?lid=1002056319136'},
{'title': '4. Cross Electronic Recycling',
'url': 'http://www.yellowpages.com/omaha-ne/mip/cross-electronic-recycling-473693798?lid=1000236876513'},
{'title': '5. 24 Hour Towing',
'url': 'http://www.yellowpages.com/omaha-ne/mip/24-hour-towing-521607477?lid=1001918028003'},
{'title': '6. A & A Towing Fast Friendly',
'url': 'http://www.yellowpages.com/omaha-ne/mip/a-a-towing-fast-friendly-478453697?lid=1000090213043'},
{'title': '7. Austin David Towing',
'url': 'http://www.yellowpages.com/omaha-ne/mip/austin-david-towing-465037110?lid=1001788338357'},...]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tal Folkman |
| Solution 2 |
