'NodeJS Async Programming

Totally new to programming with async functions. Also new to node.js which could be adding to my issue. I've read a lot and keep running into the similar problems and it seems like I've been randomly getting some portions of the async code to work, while others doesn't. Here is a simplified version of what I have:

Essentially I'm searching a site for music, scrapping all the results (scraper_start.js) and then it is sent to scrape_individual.js to gather data. It is currently able to get all the data, but when it downloads the album art it comes in "too late".

The image does get logged to the console, but only after info gets returned. Also if you have any good resources to learn async programming please share them - I haven't been able to find a website that is nice and clean and gets into examples big enough that they become realistic (such as multiple async functions working at once and sometimes relying upon each other). Please critic my code as well - I am trying to learn!

File scraper_start.js:

const rp = require('request-promise');
const cheerio = require('cheerio');
const scrape = require('./scrape_individual.js');
const base_url = 'https://www.test.ca';
const url = 'https://www.test.ca/search?mysearchstring';

rp(url)
    .then(function(html)
    {
        const $ = cheerio.load(html);
        var results = []
        
        var hits = $('h3 > a').length;
        console.log("TOTAL HITS: " + hits);

        results = $('h3 > a').map(function(i,v){ return $(v).attr('href'); }).get()
        
        return Promise.all(
            results.map(function(url) 
            {
                return scrape(base_url + url);
            })
        );
    })
    .then(function(my_data) 
    {
        console.log(my_data);

    });

File scrape_individual.js:

const rp = require('request-promise');
const cheerio = require('cheerio');
var info = {}


const scrape = function(url)
    {
        return rp(url)
        .then(function(html)
        {
            const $ = cheerio.load(html);
            if (!html.includes('contentType = "Podcast"'))
            {
                info = {
                    title: $('h2.bc-heading:first').text(),
                    img3: null};
                
                img_data($('.bc-image-inset-border').attr('src'))
                    .then(function(v) 
                    { 
                        console.log(v);
                        info.img3 = v; // Log the value once it is resolved
                    })
                    .catch(function(v) {
                    
                    });
                
                return info;
            }
        })
    };
    
function img_data(src) 
{
    return new Promise(function(resolve, reject)
    {
        const { createCanvas, loadImage } = require('canvas');
        
        loadImage(src).then((image) => 
        {
            const canvas = createCanvas(image.width, image.height);
            const ctx = canvas.getContext('2d');
            ctx.drawImage(image, 0, 0);

            resolve(canvas.toDataURL());
        });
    });
}

module.exports = scrape;

UPDATE: New Code with ASYNC / AWAIT

scraper_start.js:

const rp = require('request-promise');
const cheerio = require('cheerio');
const scrape = require('./scrape_individual.js');
const base_url = 'https://www.test.ca';
const url = 'https://www.test.ca/search?mysearchstring';
var data = [];

async function get_links(url)
{
    let html = await rp(url);
    const $ = cheerio.load(html);
    var results = [];
    
    var hits = $('h3 > a').length;
    console.log("TOTAL HITS: " + hits);
    
    hrefs = $('h3 > a').map(function(i,v){ return $(v).attr('href'); }).get()
    
    await Promise.all(hrefs.map(async (href) =>
        {
            let data_single = await scrape.scrape_book3(base_url + href);
            data.push(data_single);
        }));
    
    //QUESTION AREA 1: This data works great with all info.
    console.log(data);
    return data
}

get_links(url);
//QUESTION AREA 2: This data gets printed before getting the actual data returned.
console.log(data);

scrape_individual.js:

const rp = require('request-promise');
const cheerio = require('cheerio');
var info = {}

//scrape2(url)
module.exports.scrape_individual = scrape2;

async function scrape2(url)
{
    let html = await rp(url);
    const $ = cheerio.load(html);
    
    if (!html.includes('contentType = "Podcast"'))
    {
        let my_image = await img_data($('.bc-image-inset-border').attr('src'));
        
        info = {title: $('h2.bc-heading:first').text(),
                img3: my_image};
                
        //console.log(info);
        return info;
    }
}

async function img_data(src) 
{
    const { createCanvas, loadImage } = require('canvas');
    let image = await loadImage(src);
    const canvas = createCanvas(image.width, image.height);
    const ctx = canvas.getContext('2d');
    ctx.drawImage(image, 0, 0);
    //console.log(canvas.toDataURL());
    return canvas.toDataURL();

}

This code works great now. Also easier to understand. Please feel free to critic as I am trying to master this. My question now is more of a general coding question.

Within scraper_start.js where the end result ends up (data), I marked two comments with "QUESTION AREA 1" and "QUESTION AREA 2"

QUESTION AREA 1: works entirely fine, which I would assume because it is in the async function QUESTION AREA 2: outside of async function, does not have the object returned yet as there is nothing to say await. Is there a way to make it wait?

My question is pretty loaded. I can't use await since it's not in an async function from my understanding. Does this mean all my code needs to be in functions if I want to maintain an important order? What is best practice? Why not call every function as async?

edit: Fixing typos edit2: Added ASYNC / AWAIT modifications



Solution 1:[1]

Question Area 2 is returning before the data is captured because you are telling JS that the last function is not waiting on anyone, so it is run synchronously. But the truth is that it is actually waiting for someone, it is waiting for get_links()

So a way to get to print the data would be:

    async printer(){
    const returnedData = await get_links(URL);
    console.log(returnedData)
    }
    printer();

if you want to use the data returned from an async function you need to call it with await, so JS knows that it needs to wait for a resolved or rejected promise before going on, or else it will return a Promise<pending>. And all await need to be inside an async. In the beginning, sounds like an endless circle but it really is not.

For instance in our example you don't need any other async to call the printer() (because no other function is depending on that one) I hope it makes sense, but promises at the beginning need some time to be digested before understanding them. Async/await in my opinion are a blessing to understand promises, once you get your head around it and how they function you are better prepared to understand the resolve/reject way

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jordi Riera