'C# Split string by tag and create an iterable data structure
I have this string:
<item>
<node1>Name</node1>
<childNode1>Nickname</childNode1>
<node2>Surname</node2>
</item>
<item>
<node1>AnotherName</node1>
<node2>AnotherSurname</node2>
</item>
I want to split this string by "item", and create a data structure from text extracted from all nodes, for example: {"Name","Nickname", "Surname"}
{"Name", "Surname"}
Solution 1:[1]
If you install HtmlAgilityPack you can solve your task easily:
public class Item
{
public List<string> Properties { get; set; }
public static List<Item> LoadItems(string text)
{
var items = new List<Item>();
var doc = new HtmlDocument();
doc.LoadHtml(text);
var docItems = doc.DocumentNode.SelectNodes("//item");
foreach (var docItem in docItems)
{
var list = docItem.ChildNodes
.Where(n => n.NodeType != HtmlNodeType.Text)
.Select(n => n.InnerText)
.ToList();
if (list.Count > 0)
{
items.Add(new Item { Properties = list });
}
}
return items;
}
}
This class has a list of properties ("Name,Nickname,Surname" for your first item) and a LoadItems that parse your text. Simply select all "item" nodes and iterate the returned list selecting the InnerText (the content of each node).
You can test your sample:
const string text = @"<item>
<node1>Name</node1>
<childNode1>Nickname</childNode1>
<node2>Surname</node2>
</item>
<item>
<node1>AnotherName</node1>
<node2>AnotherSurname</node2>
</item>";
var allItems = Item.LoadItems(text);
Solution 2:[2]
While adding a root element, in order to create an XML might be better ?
Another way might be to use this regexp: https://regex101.com/r/eKMjeu/1
The code from the generator (on that side) gives:
string pattern = @"<(\/?node[0-9]|\/?childNode[0-9])>*>|\n";
string substitution = ",";
string input = @"<item>
<node1>Name</node1>
<childNode1>Nickname</childNode1>
<node2>Surname</node2>
</item>
<item>
<node1>AnotherName</node1>
<node2>AnotherSurname</node2>
</item>";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
The results in result are:
<item>
, ,Name,
, ,Nickname,
, ,Surname,
,</item>
,<item>
, ,AnotherName,
, ,AnotherSurname,
,</item>
Which might make life a little bit easier
You could add:
result = result.Replace('\r',' ');
result = result.Replace(@"</item>",Environment.NewLine.ToString());
result = "," + result.Replace(@"<item>","");
Which leave you with:
, , ,Name, , ,Nickname, , ,Surname, ,
, , ,AnotherName, , ,AnotherSurname, ,
All in all, pretty messy...
The other solution using XML, seems much nicer:
using System.Text.RegularExpressions;
using System.Xml.Linq;
using System.Xml.XPath;
string str = @"<item>
<node1>Name</node1>
<childNode1>Nickname</childNode1>
<node2>Surname</node2>
</item>
<item>
<node1>AnotherName</node1>
<node2>AnotherSurname</node2>
</item>";
str = "<root>" + str + "</root>";
XDocument xml = XDocument.Parse(str);
foreach(XElement e in xml.Descendants("node1")) {
XElement node2 = e.XPathSelectElement("../node2");
System.Console.WriteLine("{\"" + e.Value + "\",\"" + node2.Value + "\"}");
}
output:
{"Name","Surname"}
{"AnotherName","AnotherSurname"}
Of course this second solutions (currently) lacks error checking, and should have started with xml.Descendants("item"), but ... ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Victor |
| Solution 2 |
