Must have key skills -
- ETL Tools (IICS, Talend, Hadoop etc.)
- Azure, with basic understanding of GCP & AWS,
- JIRA Confluence,
- MS Visio (SQL / PLSQL / Oracle),
- PySpark,
- Shell script and Power BI
Data Cleaning Tools & Libraries: Proficiency with tools and libraries to clean and pre process data
for example:
- Python
- SQL
- Excel- emphasis on Familiarity with data cleaning functions, filters, and pivot tables.
Good to have skills -
- Knowledge of R
- Data Management & Analysis Skills
- Data Validation & Consistency: Ability to identify data quality issues such as duplicates,
missing values, outliers, and inconsistencies.
- Data Transformation: Experience in transforming raw data into usable formats, including
reshaping, aggregating, or normalizing data.
- Handling Missing Data: Familiarity with imputation techniques or ways to deal with incomplete
datasets.
- Data Normalization & Standardization: Ensuring uniformity in data formats, units of
measurement, and naming conventions.
- Data Aggregation: Summarizing or grouping data for analysis and ensuring that it is consistent
across all sources.
- Knowledge of Data Quality
- Data Integrity: Understanding the importance of maintaining accurate and consistent data over time.
- Data Profiling: Identifying patterns, anomalies, and key characteristics of the dataset.
- Error Detection: Ability to find and correct errors within datasets by checking for outliers,
misclassifications, or missing values.
- Soft Skills
- Attention to Detail: The ability to identify small inconsistencies and issues within large datasets.
- Problem-Solving: Being resourceful in resolving data issues and proposing solutions.
- Critical Thinking: Analyzing data in-depth and understanding its implications.
- Communication: Ability to explain data issues and cleaning steps to non-technical stakeholders.
- Experience with Data Formats
- Structured Data: Familiarity with both structured (tables, databases)
- Data Sources: Ability to clean data from various sources such as spreadsheets, databases, APIs, logs