'How can I extract text from HTML in a MySQL column?

We have some form data which flows from our CMS into our CRM. The data captured in the CRM contains HTML elements like this.

<dl><dt>First Name:</dt><dd>Joe</dd>
<dt>Last Name:</dt><dd>Blogs</dd>  
<dt>Personal Email:</dt>
<dd>[email protected]</dd>  
<dt>Most recent job title:</dt>
<dd>Manager</dd>  
<dt> Level of membership you are applying for:</dt>
<dd>Full Member</dd>

I've played around with substrings but can only get as far as the first name. Shorter name either with the next string or longer names need more character length.

substring(fieldname, 29, 15) as FirstName,

Results

  1. Joe (Too long)
  2. Anne-Mar (too short)

Is there another way or way of capturing between specific text?



Solution 1:[1]

As stated in the comments, you are better off doing this at pretty much any level other than the database, but I've been playing around in SQLFiddle and came up with this as a starting point. The only thing I did to your original example was add the /dl tag to the end to make it somewhat well formed xml. I was pleasantly surprised at how well MySQL handled it.

set @xml = '<dl><dt>First Name:</dt><dd>Joe</dd>
<dt>Last Name:</dt><dd>Blogs</dd>  
<dt>Personal Email:</dt>
<dd>[email protected]</dd>  
<dt>Most recent job title:</dt>
<dd>Manager</dd>  
<dt> Level of membership you are applying for:</dt>
<dd>Full Member</dd></dl>';

select ExtractValue(@xml, '//dt[1]') as question, ExtractValue(@xml, '//dd[1]') as answer
  union
select ExtractValue(@xml, '//dt[2]') as question, ExtractValue(@xml, '//dd[2]') as answer
  union
select ExtractValue(@xml, '//dt[3]') as question, ExtractValue(@xml, '//dd[3]') as answer
  union
select ExtractValue(@xml, '//dt[4]') as question, ExtractValue(@xml, '//dd[4]') as answer;

Results

question answer
First Name: Joe
Last Name: Blogs
Personal Email: [email protected]
Most recent job title: Manager

(This is definitely a starting point. I'm not a MySQL expert and I'm very new to posting, so please feel free to improve this.)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1