'Why does scraping a Persian website with a non-English URL generate errors?

I am attempting to scrape a Persian website with the following code:

import urlparse, urllib
parts = urlparse.urlsplit(u'http://fa.wikipedia.org/wiki/صفحهٔ_اصلی')
parts = parts._replace(path=urllib.quote(parts.path.encode('utf8')))
encoded_url = parts.geturl().encode('ascii')
'https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C'

I get this error message in the prompt when I run my crawler:

ModuleNotFoundError: No module named urlparse

And in VS Code there are three underlined words. When I click on them, the following error messages are displayed:

  1. Unable to import 'scrapy'
  2. Unable to import 'urlparse'
  3. Module 'urllib' has no quote member

What is wrong with my code?



Solution 1:[1]

import urllib.parse
parts = urllib.parse.urlsplit(u'http://fa.wikipedia.org/wiki/?????_????')
parts = parts._replace(path=urllib.parse.quote(parts.path.encode('utf8')))
encoded_url = parts.geturl().encode('ascii')
'https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C'

print(encoded_url)

This code runs in python 3.* environment as urlparse library was replaced by urllib.parse

Solution 2:[2]

By the error messages you don't have them, go to their respective site and look on how to install.

1 Note for urlparse change it is now named urllib.parse not urlparse

2 Scrapy

Solution 3:[3]

You should only use this:

FEED_EXPORT_ENCODING='UTF-8'

in your settings.py file.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 SahilDesai
Solution 2 Gabriel Domene
Solution 3 Hassan Ebrahimi