Data Engineer Processing

Location:

Richmond, VA

Salary:

70000

Posted:

September 18, 2025

Contact this candidate

Resume:

GANESH

Data Engineer

Richmond, VA 937-***-**** ****************@*****.*** LinkedIn

Experienced Data Engineer with around 5 years of expertise in designing and developing scalable data solutions across cloud platforms including Azure and AWS. Proficient in building robust ETL/ELT pipelines using Databricks and Azure Data Factory, supporting both batch and real-time processing with PySpark, Spark SQL, and Kafka. Adept at implementing Medallion Architecture, managing structured and semi-structured data with Delta Lake and Delta Live Tables (DLT). Familiar with data lake storage systems, orchestration tools, and NoSQL databases such as MongoDB, DynamoDB, and HBase. Proven ability in data governance, metadata management, and securing enterprise data using Unity Catalog and cloud-native controls. Strong programming background in Python, SQL, and Java, with experience in API integrations and building insightful dashboards using Power BI. A collaborative team player with excellent communication and problem-solving skills, delivering data-driven solutions aligned with business goals. EXPERIENCE

Data Engineer Humana, USA Aug 2024 – Current

Designed and developed ETL pipelines in Azure Databricks, implementing Medallion Architecture for structured data processing.

Built data ingestion workflows to extract, transform, and load data from source systems into Databricks for analytical processing.

Automated data workflows using Databricks Workflows and Delta Live Tables (DLT) to improve efficiency and reliability.

Developed SQL-based ETL queries on source systems to validate output and ensure data integrity before transformation.

Engineered forecasting models in Databricks to predict future sales, optimizing data-driven decision-making.

Conducted data validation and testing across multiple layers to ensure accuracy and consistency in analytical outputs.

Worked with Python libraries like Pandas and NumPy for Data validation.

Developed KPI-driven reports and dashboards, transforming raw data into actionable insights for business users.

Developed visualizations and dashboards using PowerBI.

Provided technical support and debugging expertise, resolving pipeline failures and improving system reliability.

Expertise in data validation, testing, and performance tuning of large-scale data processing workloads.

Developed Apache Spark applications by using spark for data processing from various streaming sources.

Worked with multiple data formats, including JSON, CSV and Excel, ensuring seamless ingestion and transformation in ETL pipelines.

Created Tables, Stored Procedures, and extracted data using SQL for business users whenever required.

Worked in a cross-functional environment, collaborating with business analysts, managers, and leadership teams to align data solutions with business goals

Environment: Azure (Databricks, Data Lake Storage, Synapse Analytics, Data Factory), Spark, Databricks Workflows, Delta Live Tables (DLT), Medallion Architecture, Apache Spark (PySpark, Spark SQL), Delta Lake, JSON, CSV, Excel, Python (Pandas, NumPy), SQL, Unity Catalog, GitHub, CI/CD Pipelines, PowerBI. Data Engineer Cisco Systems, USA April 2023 – July 2024

Designed and developed data pipelines and ETL processes for data ingestion, transformation, and loading.

Worked with big data ecosystems, including Hadoop, MapReduce, Hive, Apache Spark, and HDFS for big data processing and analysis and enhanced existing ETL processes, reducing data loading times by 40% and increasing data accuracy.

Leveraged cloud technologies (AWS, GCP, Azure) for data storage, compute, and analytics.

Created interactive data dashboards using Tableau, providing actionable insights to business stakeholders and decision- makers.

Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.

Automated data ingestion from various sources, reducing manual efforts by 60% and increasing data processing efficiency.

Developed real-time data streaming solutions using Kafka, enabling instant access to critical data for real-time analytics.

Collaborated with cross-functional teams, using project management tools like Jira, Confluence, Jenkins, and Git.

Maintained documentation, trained team members, and provided support for data-related tasks. Environment: AWS (Redshift, S3), GCP, Apache Spark (PySpark, Spark SQL), Hadoop Ecosystem (HDFS, Hive, MapReduce), Delta Lake, Kafka, MongoDB, Oracle, SQL Server, T-SQL, Python (Pandas, NumPy), Tableau, JSON, CSV, Excel, Git, Jenkins, Jira, Confluence, CI/CD Pipelines. Data Engineer Cybage Software, India Jan 2019 – Jul 2021

Utilized Python, SQL, and Scala to extract, transform, and load data from diverse sources into data lakes and data warehouses.

Established robust data monitoring solutions, reducing system downtime by 25% and ensuring data availability.

Implemented data encryption and access control measures, ensuring compliance with data security standards and reducing security incidents by 20%.

Built and maintained data engineering frameworks such as Kafka, Airflow, and Snowflake to support data operations.

Conducted data analysis and machine learning using libraries like NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, and TensorFlow.

Implemented ETL solutions using SSIS and SSAS, automating data workflows and enhancing data quality.

Developed data visualizations and reports using, Power BI, and SSRS to aid data-driven decision-making.

Managed and optimized databases, including MS SQL Server, PostgreSQL, and MySQL.

Assisted in data-driven decision-making by providing actionable insights and recommendations.

Implemented data quality checks and automated data cleansing processes, resulting in a 25% reduction in data errors and inconsistencies.

Environment: Python (Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn, TensorFlow), SQL, Scala, Apache Kafka, Apache Airflow, Snowflake, SSIS, SSAS, SSRS, Power BI, MS SQL Server, PostgreSQL, MySQL, Data Lakes, Data Warehouses, Git, JSON, CSV, Excel, Data Encryption & Access Control Tools. SKILLS

Methodologies: SDLC, Agile, Waterfall

Programming Language: Python, SQL, R, Scala

IDE’s: PyCharm, Jupyter Notebook, Visual Studio Code Big Data Eco system: Hadoop, MapReduce, Hive, Apache Spark, Pig, HDFS ETL Tools: SSIS, SSAS

Cloud Technologies: AWS, GCP, Azure

Frameworks: Kafka, Snowflake, Docker

Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow Reporting Tools: Tableau, Power BI, SSRS

Database: MS SQL Server, PostgreSQL, MySQL

Other Tools: CMS, Jira, Confluence, Jenkins, Git, MS Office, ERP, CRM Soft Skills: Data Cleaning, Data Wrangling, Critical Thinking, Communication Skills, Presentation Skills, Problem- Solving

Operating System: Windows, Linux

EDUCATION

Master of Science in Computer Science Wright State University, Fairborn, Ohio Aug 2021 – Apr 2023 Bachelor of Computer Science Gitam University, Visakhapatnam, Andhra Pradesh Aug 2016 – Apr 2020 CERTIFICATIONS

Microsoft Certified: Azure Data Engineer Associate AWS Certified Data Analytics Specialty

Contact this candidate