'Get specific information from wikipedia on google spreadsheet (not the entire table)

I have a table from "Lead rolling actors" from Wikipedia and I want to add some columns to the table with the dates of birth, years active etc for every actor.

Lead rolling actors

It's the first time I use IMPORTXML formula but for Robert Downey Jr I am trying the following:

-Born: =IMPORTXML(G1!,"//span[@class='bday']")

< span class="bday">1965-04-04</ span>

-Years Active: =IMPORTXML(G1!,"//td[@class='infobox-data']")

< td class="infobox-data">1970–present</ td>

In both cases it gives me errors. What am I doing wrong? I looked on https://www.benlcollins.com/spreadsheets/google-sheet-web-scraper/ to get some guidance but I can't find my error.



Solution 1:[1]

From your question and showing image, unfortunately, I cannot see the URL of Robert Downey Jr. But, if the URL is supposed as https://en.wikipedia.org/wiki/Robert_Downey_Jr, I think that your xpath of //span[@class='bday'] returns 1965-04-04. But, your xpath of //td[@class='infobox-data'] returns multiple values.

In this answer, the values of 1965-04-04 and 1970–present are retrieved from the URL of https://en.wikipedia.org/wiki/Robert_Downey_Jr.

Sample 1:

In this sample, 1965-04-04 is retrieved from https://en.wikipedia.org/wiki/Robert_Downey_Jr.

=IMPORTXML("https://en.wikipedia.org/wiki/Robert_Downey_Jr","//span[@class='bday']")

enter image description here

Sample 2:

In this sample, 1970–present is retrieved from https://en.wikipedia.org/wiki/Robert_Downey_Jr.

=IMPORTXML("https://en.wikipedia.org/wiki/Robert_Downey_Jr","//td[@class='infobox-data' and ../th[contains(text(),'active')]]")

enter image description here

Note:

  • Although I'm not sure about your current URL of Robert Downey Jr, for example, how about checking the URL again? Because when I use the URL of https://en.wikipedia.org/wiki/Robert_Downey_Jr, your expected values could be retrieved.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tanaike