This project focused on cleaning and standardizing records from the World Layoffs dataset using MySQL to enhance data accuracy, consistency, and readiness for analysis. The dataset comprised multiple fields, including company
, location
, industry
, total_laid_off
, percentage_laid_off
, stage
, country
, and funds_raised_millions
. The data cleaning process involved identifying and resolving duplicates, handling null values and blank cells, and correcting inconsistencies across all records.
The data is in Excel for your view Download here
MYSQL
ROW_NUMBER()
, I identified and removed duplicate records from the dataset.Standardizing Data: Ensured data consistency by trimming extra spaces in the company
column. Corrected spelling variations in the industry
column, where entries like crypto currency
, crypto
, and crypto.
were standardized to Crypto
. Similarly, inconsistencies in the country
column, such as United States
and United states.
, were unified as United States
. Additionally, the date
column was converted from text to a proper DATE
format.
Handling Null Values and Blank Spaces: Removed records where both total_laid_off
and percentage_laid_off
were null or blank, as they could impact exploratory analysis. For companies with missing industry data, values were filled based on matching company names to improve data accuracy for analysis.
row_num
after completing the cleaning process.