'How can I access the html nodes of 'hidden' class with R language as I'm unable to extract certain text associated with that node

Input: read_fasta_html<-AMPs%>%html_nodes("pre")%>%html_text() read_fasta_html

Output:

read_fasta_html<-AMPs%>%html_nodes("pre")%>%html_text() read_fasta_html character(0)



Solution 1:[1]

One way to get the sequence is using the API from which the webpage gets its text. The API can be found from Fetch/XHRtab on clicking inspect element.

enter image description here

'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1626603948&db=protein&report=fasta&extrafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000' %>% 
  read_html() %>% html_text2()
[1] ">TII12583.1 GhoT/OrtT family toxin [Enterococcus faecium] MYLVRNAISFFITYFLSHDTMALVL"

You can also further look into packages rentrez and biomartr

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1