'triyng run the crawler with docker image
I'm triyng run the crawler with docker image, but it's returning this error.
PS C:\Users\Santosgab\Desktop\documentation> docker run -it --env-file=.env -e "CONFIG=$(cat config.json | jq -r tostring)" algolia/docsearch-scraper
Traceback (most recent call last):
File "/root/src/config/config_loader.py", line 101, in _load_config
data = json.loads(config, object_pairs_hook=OrderedDict)
File "/usr/lib/python3.6/json/__init__.py", line 367, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
i used the config provided by Algolia, just made some small changes to run in my project, my config.json has no problem.
{
"index_name": "Sequor",
"sitemap_urls": ["http://localhost:3000/sitemap.xml"],
"sitemap_alternate_links": true,
"stop_urls": ["/tests"],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": ["language", "version", "type", "docusaurus_tag"],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"conversation_id": ["833762294"],
"nb_hits": 46250
}
Solution 1:[1]
Two things I notice:
There's no path on your config file. Can you try it with pathing (i.e.
./config.json)Your config is missing any
start_urlsas entry points to crawling.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Chuck Meyer |
