'Azure Kusto - how to fetch urls from a string using parse
let T = datatable(Id:int, Text:string)
[
1, "SomeTextSome TextSomeText: https://someurl.com/fileId/edit/12649844",
2, "SomeText SomeText <https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
];
T | parse Text with * "someurl.com" myurl ">" * | project Id, myurl
Output
=========
Id myurl
1
2 /fileId/newedit/12702480
Need a way to parse Text field and extract url from it. Content of the text field is html body. Using parse works, if there are more characters after the url Id like ">" or blank space, however if the Text field ends with the url id, it doesn't work. Url Id is not fixed length. If not parse, is there any other way to extract everything from someurl.com until the id, irrespective of whether the url is in the middle of the string or at the end?
Solution 1:[1]
If there's any knowledge you have regarding the URLs format, you can try including that in a regular expression, and use the extract() function
.
For example:
datatable(Id:int, Text:string)
[
1, "SomeTextSome TextSomeText: https://someurl.com/fileId/edit/12649844",
2, "SomeText SomeText <https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
]
| extend Url = extract(@"someurl\.com(/\w+/\w+/\d+)", 1, Text)
Id | Text | Url |
---|---|---|
1 | SomeTextSome TextSomeText: https://someurl.com/fileId/edit/12649844|/fileId/edit/12649844| | |
2 | SomeText SomeText https://someurl.com/fileId/newedit/71244SomeTextSomeTextSomeTextSomeText | /fileId/newedit/71244 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Yoni L. |