'What is efficient way to index the mongoDB collection out of compound indexes and creating custom index using compound index fields

I have a notifications collection.

Before creating a new notification related to certain update I need to check for the notification that user hasn't seen yet, So I can update the existing notification rather than creating a new one.

Schema -

const notificationSchema = new mongoose.Schema<Notification>(
  {
    __id: Number,
    user: {
      name: String,
      type: {
        type: String,
      },
      ref: mongoose.Types.ObjectId,
    },
    topic: String,
    original: Object,
    updated: Object,
    actionDate: {
      type: Date,
    },
    needsApproval: Boolean,
    status: {
      type: String,
      enum: ['pending', 'accepted', 'rejected', 'expired'],
    },
    seen: Boolean,
  },
  {timestamps: true},
)

So, The notifications will be uniquely identified by three fields - topic, user.ref and seen.

  1. I can create a compound index for above fields in the database.
  2. I can create a uid based on topic, user.ref and keep it on the notification and index based on it as follow -
uid: topic + user.ref

So which will be the good way??



Solution 1:[1]

The problem here is that you are trying to access a NoneType variable. next_page = soup.select_one('a.next.ajax-page', href=True) return nothing so you cant access ['href'] inside

Solution 2:[2]

What happens?

Your selection soup.find('a.next.ajax-page', href=True) is not finding the element you are searching for in any way cause it is a mix of syntaxes (find and css selectors) and will always return None - So it also won't be able accessing the attribute value.

How to fix?

Change your line checking the next page element from:

if soup.find('a.next.ajax-page', href=True) == None:

to:

if soup.find('a',{'class':'next ajax-page'}) == None:

or

if soup.select_one('a.next.ajax-page') == None:

You also should be able to scrape all basic information of the search results and store these in one step instead of returning a list of urls for search pages:

def page_parse(url):
    data = []
    while True:
        page = requests.get(url)
        soup = BeautifulSoup(page.text)
        for item in soup.select('div.result'):
            data.append({
                'title':item.h2.text,
                'url':f"{baseUrl}{item.a['href']}"
            })

        if (url := soup.select_one('a.next.ajax-page')):
            url = f"{baseUrl}{url['href']}"
        else:
            return data

Example

import requests
from bs4 import BeautifulSoup

baseUrl = 'http://www.yellowpages.com'

def page_parse(url):
    data = []
    while True:
        page = requests.get(url)
        soup = BeautifulSoup(page.text)
        for item in soup.select('div.result'):
            data.append({
                'title':item.h2.text,
                'url':f"{baseUrl}{item.a['href']}"
            })

        if (url := soup.select_one('a.next.ajax-page')):
            url = f"{baseUrl}{url['href']}"
        else:
            return data

page_parse('http://www.yellowpages.com/omaha-ne/towing')

Output

[{'title': "1. Keith's BP",
  'url': 'http://www.yellowpages.com/omaha-ne/mip/keiths-bp-460502890?lid=1002059325385'},
 {'title': '2. Neff Towing Svc',
  'url': 'http://www.yellowpages.com/omaha-ne/mip/neff-towing-svc-21969600?lid=1000282974083#gallery'},
 {'title': '3. A & A Towing',
  'url': 'http://www.yellowpages.com/omaha-ne/mip/a-a-towing-505777665?lid=1002056319136'},
 {'title': '4. Cross Electronic Recycling',
  'url': 'http://www.yellowpages.com/omaha-ne/mip/cross-electronic-recycling-473693798?lid=1000236876513'},
 {'title': '5. 24 Hour Towing',
  'url': 'http://www.yellowpages.com/omaha-ne/mip/24-hour-towing-521607477?lid=1001918028003'},
 {'title': '6. A & A Towing Fast Friendly',
  'url': 'http://www.yellowpages.com/omaha-ne/mip/a-a-towing-fast-friendly-478453697?lid=1000090213043'},
 {'title': '7. Austin David Towing',
  'url': 'http://www.yellowpages.com/omaha-ne/mip/austin-david-towing-465037110?lid=1001788338357'},...]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tal Folkman
Solution 2