'Azure Kusto - how to fetch urls from a string using parse

let T = datatable(Id:int, Text:string)
[
   1, "SomeTextSome TextSomeText: https://someurl.com/fileId/edit/12649844",
   2, "SomeText SomeText&nbsp;<https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
];
T | parse Text with * "someurl.com" myurl ">" * | project Id, myurl 

Output
=========
Id  myurl
1   
2   /fileId/newedit/12702480

Need a way to parse Text field and extract url from it. Content of the text field is html body. Using parse works, if there are more characters after the url Id like ">" or blank space, however if the Text field ends with the url id, it doesn't work. Url Id is not fixed length. If not parse, is there any other way to extract everything from someurl.com until the id, irrespective of whether the url is in the middle of the string or at the end?



Solution 1:[1]

If there's any knowledge you have regarding the URLs format, you can try including that in a regular expression, and use the extract() function.

For example:

datatable(Id:int, Text:string)
[
   1, "SomeTextSome TextSomeText:&nbsp;https://someurl.com/fileId/edit/12649844",
   2, "SomeText SomeText&nbsp;<https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
]
| extend Url = extract(@"someurl\.com(/\w+/\w+/\d+)", 1, Text)
Id Text Url
1 SomeTextSome TextSomeText: https://someurl.com/fileId/edit/12649844|/fileId/edit/12649844|
2 SomeText SomeText https://someurl.com/fileId/newedit/71244SomeTextSomeTextSomeTextSomeText /fileId/newedit/71244

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yoni L.