'How to extract specified text from columns in Excel

I have tried numerous ways but have failed to get the right data to the new columns.

The column description contains a lot of information in which I want to extract to separate columns. Examples

Aged Care HCP - 29/6/2021 - Social Support Weekday 7am – 7pm 9:00AM 12:00PM

I want to extract the above to the below Columns:

Sector Date Support Schedule Day Day time Shift Time
Aged Care HCP 29/06/2021 Social Support Weekday 7am - 7pm 09:00am - 12:00pm

Can anyone please guide me or provide me the solution on how I can clean the data to the above said columns? I have attached the data for your perusal.

I have tried using excel power query

Below is the link for the document https://www.dropbox.com/scl/fi/4o09adq936cc7cls28sfz/Worksheet.xlsx?dl=0&rlkey=zalomc9pagexd435z014q6qjc



Solution 1:[1]

It's difficult with unstructured data, but here's an example in Power Query of how you can parse some of it out. Depending on the actual data and variety, you will have to tweak it.

let
    Source = Excel.Workbook(File.Contents("C:\Sample.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type any}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type any}, {"Column6", type any}, {"Column7", type any}, {"Column8", type any}, {"Column9", type any}, {"Column10", type text}, {"Column11", type text}}),
    #"Removed Top Rows" = Table.Skip(#"Changed Type",4),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type any}, {"Source", type text}, {"Description", type text}, {"Reference", type text}, {"Debit", type number}, {"Credit", type number}, {"Running Balance", type number}, {"Gross", type number}, {"GST", type number}, {"Account Code", Int64.Type}, {"Department", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type1", each ([Description] <> null)),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.StartsWith([Description], "Aged Care HCP")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows1", "Description", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), {"Sector", "Service Date", "Description"}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Split Column by Delimiter", {{"Service Date", type date}}, "en-GB"),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type with Locale","Weekday","|Weekday|",Replacer.ReplaceText,{"Description"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value", "Description", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Support Schedule", "Day", "Description"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Split Column by Delimiter1"," – ","–",Replacer.ReplaceText,{"Description"}),
    #"Trimmed Text" = Table.TransformColumns(#"Replaced Value1",{{"Description", Text.Trim, type text}}),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Trimmed Text", "Description", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Day time", "Shift Time"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Sector", type text}, {"Support Schedule", type text}, {"Day", type text}, {"Day time", type text}, {"Shift Time", type text}})
in
    #"Changed Type2"

Result

Solution 2:[2]

use pandas https://pypi.org/project/pandas/

import pandas as pd 

df = pd.read_csv('filename.csv')

# will print the column
print(df.col_name)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TheRizza
Solution 2 Benjamin Kolber