'Get span content value witrh regex (bash)
Hello I'm trying to get the content of my class top. All I need is the link (without any tags) and and the value of the span class title in bash. I do something like this (for test) but this dose not give any answer. What I am doing wrong ?
curl -s https://www.website.com/q?search=violet | grep -e "^<span class=\"top\">(.*?)</span>"
<div class="video-item-list">
<span class="age0" title="0"></span>
<span class="hsa" title="tex"></span>
<span class="Encour" title="test"></span>
<a href="https://www.website.com/a/1973">
<img class="image lazy" width="100" height="40"
data-original="https://img.com/i?jpg=123">
</a>
<span class="top">
<a href="https://www.website.com/a/1973">
<span class="title">Violet test</span>
</a>
<span class="episode"> 250
</span>
<a class="team"></a>
</span>
<span class="info"> 2017</span>
</div>
<div id="n" class="video-item-list-days">
<h5>Letter n</h5>
</div>
Solution 1:[1]
As mentioned in comments, regular expressions are the wrong tool for working with HTML. One approach using a XSLT stylesheet and xsltproc:
example.xslt:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="/">
<xsl:for-each select="//span[@class='top']">
<xsl:value-of select="a[@href]/@href" />
<xsl:text>	</xsl:text>
<xsl:value-of select="a[@href]/span[@class='title']" />
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Usage:
$ curl -s https://www.website.com/q?search=violet | xsltproc --html example.xslt -
https://www.website.com/a/1973 Violet test
Solution 2:[2]
Suggesting RegExp pattern to match FIRST span class only.
grep -oP '(?<=<span class=")[^"]+'
Tested for your sample:
age0
hsa
Encour
top
title
episode
info
Not sure if that was your intention.
If you need only FIRST span classes closed element in same line.
grep -oP '(?<=<span class=")[^"]+(?=".*</span>)' input.1.txt
Tested for your sample:
age0
hsa
Encour
title
info
Solution 3:[3]
Thanks everyone, I do this and it working. May be it's a bad idea but I will see later
(?<=<span class="top">).*?(?=<\/span>)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Shawn |
| Solution 2 | |
| Solution 3 | meteor314 |
