'Recognize urls and anchor tags in scala

Im trying to port over Regex into Scala from JavaScript but its not behaving as I expected. My understanding is that it should port over pretty much one for one. The overall goal of what Im trying to do is be able to recognize URLs either with/without protocols and if they are in a anchor tag. Currently Ive got this for the regex.

  private val entireCapture = "entireCapture"
  private val aHrefOpen = "aHrefOpen"
  private val urlBody = "urlBody"
  private val aHrefClose = "aHrefClose"

  private val A_HREF_OPEN_CAPTURING_GROUP = "(<a [^>]*)?" //This should get <a>
  private val A_HREF_CLOSE_CAPTURING_GROUP = "([^>]*<\\/a>)?"// This should get </a>

  private val URL_PROTOCOL = "(?:(?:https?):\\/\\/|www\\d{0,3}[.])"
  private val URL_DOMAIN = "(?:\\([A-Z0-9+&@#\\/%=~_|$?!;:,.\\-]*\\)|[^\\s()<>])*"
  private val URL_PATH = "(?:\\([A-Z0-9+&@#\\/%=~_|$?!;:,.\\-]*\\)|[^\\s()<>])"

  private val URL_BODY = s"($URL_PROTOCOL$URL_DOMAIN$URL_PATH)" //This should get the acutal url
  private val URL_MATCHER =
    (s"$A_HREF_OPEN_CAPTURING_GROUP$URL_BODY$A_HREF_CLOSE_CAPTURING_GROUP").r(
      entireCapture,
      aHrefOpen,
      urlBody,
      aHrefClose
    )

Then when I run

class LinkUtil {

  def matchURL(text: String): String = {
    URL_MATCHER
      .replaceAllIn(
        text,
        x => rewriteMatch(x.group(entireCapture), x.group(aHrefOpen), x.group(urlBody), x.group(aHrefClose))
      )
  }

  def rewriteMatch(entireCapture: String, aHrefOpen: String, urlBody: String, aHrefClose: String): String = {
    if (aHrefOpen.nonEmpty || aHrefClose.nonEmpty) {
      entireCapture
    }
    else {
      val linkUrl = if (urlBody.matches("""^https?:""")) { urlBody }
      else { ("http://" + urlBody) }
      linkUrl
    }
  }

}

With various urls( Im using https://google.com, www.google.com, <a href="https://google.com">my link</a> it keeps crashing because of the aHrefClose group, it always fails with a ArrayIndexOutOfBounds error on 4. I don't understand WHY its not working like I expected, at worst I thought that if a group didn't get a match it would be null instead of not existing. What am I doing wrong here? Did I set up the regex incorrectly or is it something else?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source