Category "data-cleaning"

Is there a way or package to clean French postal code data in Stata?

I got a task for work regarding Stata, there I need to clean zip code data. 71000 is for example Paris 71001 is only a part of Paris. In my task there are firms

How can I auto populate several excel sheets from other Excel files

I am currently working on a power bi dashboard that uses an excel file as a data source. I want to auto populate the excel file with new values from existing ex

Join large set of CSV files where the header is the timestamp for the file

I have a large set of CSV files. Approx. 15 000 files. And would like to figure out how to join them together as one file for data processing. Each file is in a

How to delete empty spaces from pandas DataFrame rows until first populated field?

Lets say I imported a really messy data from a PFD and I´m cleaning it. I have something like this: Name Type Date other1 other2 other3 Name1 '' '' Type1

How to Remove quotation mark with object data type from a column in Python and convert to float

Customer id ----- object ValueError: could not convert string to float: "'5769842393258'" df["Customer id"] = df["Customer id"] .replace('"', '',

Remove underscore and number at the end of string

I am working with a dataset that has column with some underscores. There is a patter to it but they are different patterns, as shown below ID Col1 1029

How To Sum Count Result?

I have a database that will count daily total amount of customer that does or doesn't have a transactions. Customer Column is a varchar data type Here is how

How to Check Which Record is non-numeric in a String Column in Delta Table

I am working on Delta table using Databricks on Azure. The Delta table contains about 100 million records with many columns. One column data type of which is S

How do I remove nonsensical or incomplete words from a corpus?

I am using some text for some NLP analyses. I have cleaned the text taking steps to remove non-alphanumeric characters, blanks, duplicate words and stopwords, a