'Using RegEx to extract data from an anchor tag

I have the following anchor tag in an html document that I want to extract the link and the text from:

<a href="https://www.catholicgallery.org/bible-drb/acts-9/">Acts 9:</a> 1-20

I have tried using two different methods.

calling TestRegEx with

        IEnumerable <Tuple<string, string, string>> tuple = TestRegEx(reading.readinghRef);

where TestRegEx is:

    protected IEnumerable<Tuple<string, string, string>> TestRegEx (string html)
    {
        Regex r = new Regex(@"<a.*?href=(""|')(?<href>.*?)(""|').*?>(?<value>.*?)</a>\s(?<verses>.*?)");

        foreach (Match match in r.Matches(html))
            yield return new Tuple<string, string, string>(
                match.Groups["href"].Value, match.Groups["value"].Value, match.Groups["verses"].Value);
    }

I have also tried:

            Regex regex = new Regex(@"<a\shref=""(?<url>.*?)"">(?<text>.*?):</a>\s(?<verses>.*?)");
            Match match = regex.Match(reading.readinghRef);

            string text = match.Groups["text"].Value;
            string[] textParts = text.Split(' ');
            string verses = match.Groups["verses"].Value;

            string book = "";
            for (int i = 0; i < textParts.Length - 1; i++)
            {
                if (book.Length > 0)
                    book += " ";
                book += textParts[i];
            }

            string chapter = textParts[textParts.Length - 1];

They both succeed in getting the book and the url, but fail to get the verses. Item 2 in the tuple is not yet parsed to book and chapter. That is not the problem. The problem is not getting the verses at the end of the html string.

Solution 1:^[1]

The only problem with your first regex is the non-greedy

(?<verses>.*?)

Replace with the greedy version, and you'll get the verses.

(?<verses>.*)

https://regex101.com/r/w2lgaX/1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	ejkeep

'Using RegEx to extract data from an anchor tag

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]