Lasya M H
Data Engineer
*******@*****.*** +1-913-***-**** Linkedin www.linkedin.com/in/lasyamacharam SUMMARY
Designed and developed scalable ETL/ELT pipelines using Spark, Airflow, and SSIS to process and transform large datasets on Azure and AWS platforms.
Engineered and maintained RDBMS and NoSQL databases, including MySQL, PostgreSQL, SQL Server, and MongoDB, ensuring 99% system uptime and seamless data accessibility.
Implemented advanced data transformation, validation, and cleansing workflows using Spark and SQL, enabling accurate and reliable data analysis.
Automated deployment workflows by creating CI/CD pipelines using Docker, Kubernetes, and Azure DevOps, improving system reliability and development efficiency. SKILLS
Methodologies : SDLC, Agile.
Programming Languages : Python, SQL, C, C++, Unix, T-SQL. Big Data Technologies : Hadoop, Apache Spark, Hive, Pig, Sqoop, Flume, Spark core, Spark SQL, MapReduce Spark streaming, Apache Kafka, Apache Flink, Yarn, and Zookeeper. Frameworks/Libraries : Django, and NumPy, Pandas, Pyspark, SQLalchemy, Confluent Kafka. ETL : Azure data factory, AWS Glue, Apache Airflow, SSIS, Matillion.
Azure Stack : Azure Databricks, Blob storage, Azure functions, HDInsight, Stream Analytics, Event hubs, Logic apps, Virtual machines, Azure service bus, Power BI.
AWS Stack : Aws Athena, S3, Lambda, SNS, EC2, Sagemaker, Kinesis, RDS, EMR, IAM, Code pipeline, Route53, Quick sight, CloudWatch, ECS, Step functions, Elastic search. Databases/Datawarehouse : Sqlserver, Mysql, PostgreSQL, MongoDB, DB2, Oracle, DynamoDB, Cosmos DB, Azure SQL, Cassandra, Azure Synapse analytics, Azure data lake gen2, Snowflake, Amazon Redshift.
CI-CD/Tools : Docker, Azure DevOps, AWS code pipeline, Azure Kubernetes, Jenkins, Terraform Version Control Tools : Git, GitHub, GitLab, Jira
Operating Systems : Windows, Linux
WORK EXPERIENCE
• Data Engineer - I Jan 2024 - Dec 2024
Kroger Cincinnati, OH
Built AWS Glue workflows to handle data ingestion and successfully migrated all tables from an on-premises SQL Server database to Amazon S3 with minimal downtime.
Used Apache Spark on Amazon EMR to process large datasets up to 8TB, improving processing speed by 30% and making data available faster for reporting.
Connected Amazon QuickSight to Amazon RDS (MySQL/PostgreSQL) to create dashboards that effectively visualized and analyzed key business metrics.
Set up AWS Lambda and Amazon CloudWatch with AWS Step Functions to monitor data pipelines in real-time, cutting down issue resolution time by 35%.
Scheduled workflows using Apache Airflow on Amazon MWAA, ensuring smooth data transformation and resolving 100% of issues by checking logs and fixing errors.
Designed Snowflake-style modeling in Amazon Redshift, improving data quality by 12%, handling slowly changing dimensions, and enhancing data validation processes. Technology Stack: AWS Glue, Amazon S3, SQL, Amazon EMR, Amazon RDS (MySQL/PostgreSQL), Amazon QuickSight, Snowflake-style Modeling, Amazon Redshift, AWS Lambda, Amazon CloudWatch, IAM.
• Data Engineer - I July 2022 - Aug 2023
Express Scripts Hyderabad, India
Wrote SQL Server stored procedures, functions, views, and triggers using SQL and T-SQL to improve database performance and functionality.
Built ETL pipelines for data integration in both cloud and on-premises environments using Azure Data Factory and SSIS packages.
Created Spark applications using PySpark and Spark-SQL, improving data extraction and aggregation by 30% across multiple file formats like JSON, Parquet, Avro, CSV, and XML.
Improved Azure Data Factory pipeline deployments by implementing CI/CD processes with Azure DevOps, Kubernetes, and Docker, leading to a 50% increase in production release efficiency.
Used Agile methodology and Azure Boards for sprint planning, ensuring tasks were completed on time and coordinating with a team of 4 members and other teams.
Migrated data from SQL Server, DB2, Oracle, and MySQL to Azure Synapse Analytics and Azure Data Lake, cutting data latency by 30%.
Technology Stack: SQL Server, T-SQL, Azure Data Factory, SSIS, PySpark, Spark-SQL, JSON, Parquet, Avro, CSV, XML, Azure DevOps, Kubernetes, Docker, Azure Synapse Analytics, Azure Data Lake, DB2, Oracle, MySQL, Agile Methodology, Azure Boards.
• Data Engineer - Intern Aug 2021 - June 2022
Happiest minds Bangalore, India
Assisted in building ETL pipelines using tools like Talend and SSIS to extract, transform, and load data from various source systems into a central data warehouse.
Supported the migration of on-premises data to cloud-based storage solutions, ensuring data accuracy and minimal downtime during the process.
Developed SQL queries and scripts to perform data transformations, validations, and aggregations for reporting and analytics purposes.
Conducted data profiling and quality checks to identify inconsistencies, duplicates, and missing data, improving overall data reliability.
Technology Stack: Talend, SSIS, SQL, Data Warehousing, Data Transformation, Data Validation, Data Profiling, ETL, Cross-functional Collaboration.
Education
• University of Central Missouri Aug 2023 - May 2025 Master of Science, Computer Engineering [GPA: 3.7/4] Warrensburg,MO Teaching Experience
CSE 983 – Web semantic Jan 2024 - May 2024
Assisted instructor in teaching Computer Systems course by guiding during lab sessions, grading assignments, holding office hours, and facilitating student discussions.