'Need help scraping image from craigslist
I've tried everything I can think of. I'm able to get postUrl, date, title, price and location. If you go to https://sandiego.craigslist.org/search/sss?query=surfboards and paste the code snippet below into the console it returns all the images. But when I try to access in my code it's returning undefined. Any help on this would be greatly appreciated!
$('#search-results > li').each((index, element) => {
console.log( $(element).children().find('img').attr('src') )
})
import axios from 'axios'
import request from 'request-promise'
import cheerio from 'cheerio'
import express from 'express'
import path from 'path'
const __dirname = path.resolve();
const PORT = process.env.PORT || 8000;
const app = express();
app.get('', (req, res) => {
res.sendFile(__dirname + '/views/index.html')
});
const surfboards = [];
axios("https://sandiego.craigslist.org/search/sss?query=surfboards")
.then(res => {
const htmlData = res.data;
const $ = cheerio.load(htmlData);
$('#search-results > li').each((index, element) => {
const postUrl = $(element).children('a').attr('href');
const date = $(element).children('.result-info').children('.result-date').text();
const title = $(element).children('.result-info').children('.result-heading').text().trim();
const price = $(element).children('.result-info').children('.result-meta').children('.result-price').text();
const location = $(element).children('.result-info').children('.result-meta').children(".result-hood").text().trim();
// Why is this not working?!?!?!?!?!
const img = $(element).children().find('img').attr('src');
surfboards.push({
title,
postUrl,
date,
price,
location,
img
})
})
return surfboards
}).catch(err => console.error(err))
app.get('/api/surfboards', (req, res) => {
const usedboards = surfboards
return res.status(200).json({
results: usedboards
})
})
// Make App listen
app.listen(PORT, () => console.log(`Server is listening to port ${PORT}`))
Solution 1:[1]
Looks like the page sets the images with JavaScript. Thus axios gets the HTML without actual links to images.
But there seems to be a workaround here. You can generate links to images by concatenate https://images.craigslist.org and data-ids value from parent a tag.
You can get the data-ids like this:
var data_ids = $(element).children('a').attr('data-ids')
then split it to array by comma, delete first two 3: symbols and concat it like this:
`${img_base_url}/${ids}_${resolution_and_extension}`
But if you need to get URL only for first image then there is no need to create new array each time. Use substring instead (note that sometimes li don't have image at all):
if (data_ids && data_ids.includes(',')) {
data_ids.substring(data_ids.indexOf('3:') + 2, data_ids.indexOf(','))
} else if (data_ids) {
data_ids.substring(data_ids.indexOf('3:') + 2, data_ids.length)
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

