Data Engineer

Location:

Overland Park, KS

Posted:

March 04, 2025

Contact this candidate

Resume:

Lasya M H

Data Engineer

*******@*****.*** +1-913-***-**** Linkedin www.linkedin.com/in/lasyamacharam SUMMARY

● Designed and developed scalable ETL/ELT pipelines using Spark, Airflow, and SSIS to process and transform large datasets on Azure and AWS platforms.

● Engineered and maintained RDBMS and NoSQL databases, including MySQL, PostgreSQL, SQL Server, and MongoDB, ensuring 99% system uptime and seamless data accessibility.

● Implemented advanced data transformation, validation, and cleansing workflows using Spark and SQL, enabling accurate and reliable data analysis.

● Automated deployment workflows by creating CI/CD pipelines using Docker, Kubernetes, and Azure DevOps, improving system reliability and development efficiency. SKILLS

Methodologies : SDLC, Agile.

Programming Languages : Python, SQL, C, C++, Unix, T-SQL. Big Data Technologies : Hadoop, Apache Spark, Hive, Pig, Sqoop, Flume, Spark core, Spark SQL, MapReduce Spark streaming, Apache Kafka, Apache Flink, Yarn, and Zookeeper. Frameworks/Libraries : Django, and NumPy, Pandas, Pyspark, SQLalchemy, Confluent Kafka. ETL : Azure data factory, AWS Glue, Apache Airflow, SSIS, Matillion. Azure Stack : Azure Databricks, Blob storage, Azure functions, HDInsight, Stream Analytics, Event hubs, Logic apps, Virtual machines, Azure service bus, Power BI.

AWS Stack : Aws Athena, S3, Lambda, SNS, EC2, Sagemaker, Kinesis, RDS, EMR, IAM, Code pipeline, Route53, Quick sight, CloudWatch, ECS, Step functions, Elastic search. Databases/Datawarehouse : Sqlserver, Mysql, PostgreSQL, MongoDB, DB2, Oracle, DynamoDB, Cosmos DB, Azure SQL, Cassandra, Azure Synapse analytics, Azure data lake gen2, Snowflake, Amazon Redshift.

CI-CD/Tools : Docker, Azure DevOps, AWS code pipeline, Azure Kubernetes, Jenkins, Terraform Version Control Tools : Git, GitHub, GitLab, Jira

Operating Systems : Windows, Linux

WORK EXPERIENCE

• Data Engineer - I Jan 2024 - Dec 2024

Kroger Cincinnati, OH

● Built AWS Glue workflows to handle data ingestion and successfully migrated all tables from an on-premises SQL Server database to Amazon S3 with minimal downtime.

● Used Apache Spark on Amazon EMR to process large datasets up to 8TB, improving processing speed by 30% and making data available faster for reporting.

● Connected Amazon QuickSight to Amazon RDS (MySQL/PostgreSQL) to create dashboards that effectively visualized and analyzed key business metrics.

● Set up AWS Lambda and Amazon CloudWatch with AWS Step Functions to monitor data pipelines in real-time, cutting down issue resolution time by 35%.

● Scheduled workflows using Apache Airflow on Amazon MWAA, ensuring smooth data transformation and resolving 100% of issues by checking logs and fixing errors.

● Designed Snowflake-style modeling in Amazon Redshift, improving data quality by 12%, handling slowly changing dimensions, and enhancing data validation processes. Technology Stack: AWS Glue, Amazon S3, SQL, Amazon EMR, Amazon RDS (MySQL/PostgreSQL), Amazon QuickSight, Snowflake-style Modeling, Amazon Redshift, AWS Lambda, Amazon CloudWatch, IAM.

• Data Engineer - I July 2022 - Aug 2023

Express Scripts Hyderabad, India

● Wrote SQL Server stored procedures, functions, views, and triggers using SQL and T-SQL to improve database performance and functionality.

● Built ETL pipelines for data integration in both cloud and on-premises environments using Azure Data Factory and SSIS packages.

● Created Spark applications using PySpark and Spark-SQL, improving data extraction and aggregation by 30% across multiple file formats like JSON, Parquet, Avro, CSV, and XML.

● Improved Azure Data Factory pipeline deployments by implementing CI/CD processes with Azure DevOps, Kubernetes, and Docker, leading to a 50% increase in production release efficiency.

● Used Agile methodology and Azure Boards for sprint planning, ensuring tasks were completed on time and coordinating with a team of 4 members and other teams.

● Migrated data from SQL Server, DB2, Oracle, and MySQL to Azure Synapse Analytics and Azure Data Lake, cutting data latency by 30%.

Technology Stack: SQL Server, T-SQL, Azure Data Factory, SSIS, PySpark, Spark-SQL, JSON, Parquet, Avro, CSV, XML, Azure DevOps, Kubernetes, Docker, Azure Synapse Analytics, Azure Data Lake, DB2, Oracle, MySQL, Agile Methodology, Azure Boards.

• Data Engineer - Intern Aug 2021 - June 2022

Happiest minds Bangalore, India

● Assisted in building ETL pipelines using tools like Talend and SSIS to extract, transform, and load data from various source systems into a central data warehouse.

● Supported the migration of on-premises data to cloud-based storage solutions, ensuring data accuracy and minimal downtime during the process.

● Developed SQL queries and scripts to perform data transformations, validations, and aggregations for reporting and analytics purposes.

● Conducted data profiling and quality checks to identify inconsistencies, duplicates, and missing data, improving overall data reliability.

Technology Stack: Talend, SSIS, SQL, Data Warehousing, Data Transformation, Data Validation, Data Profiling, ETL, Cross-functional Collaboration.

Education

• University of Central Missouri Aug 2023 - May 2025 Master of Science, Computer Engineering [GPA: 3.7/4]Warrensburg,MO Teaching Experience

CSE 983 – Web semantic Jan 2024 - May 2024

Assisted instructor in teaching Computer Systems course by guiding during lab sessions, grading assignments, holding office hours, and facilitating student discussions.

Contact this candidate