Lasya M H
Data Engineer
*******@*****.*** +1-913-***-**** Linkedin www.linkedin.com/in/lasyamacharam SUMMARY
● Designed and developed scalable ETL/ELT pipelines using Spark, Airflow, and SSIS to process and transform large datasets on Azure and AWS platforms.
● Engineered and maintained RDBMS and NoSQL databases, including MySQL, PostgreSQL, SQL Server, and MongoDB, ensuring 99% system uptime and seamless data accessibility.
● Implemented advanced data transformation, validation, and cleansing workflows using Spark and SQL, enabling accurate and reliable data analysis.
● Automated deployment workflows by creating CI/CD pipelines using Docker, Kubernetes, and Azure DevOps, improving system reliability and development efficiency. SKILLS
Methodologies : SDLC, Agile.
Programming Languages : Python, SQL, C, C++, Unix, T-SQL. Big Data Technologies : Hadoop, Apache Spark, Hive, Pig, Sqoop, Flume, Spark core, Spark SQL, MapReduce Spark streaming, Apache Kafka, Apache Flink, Yarn, and Zookeeper. Frameworks/Libraries : Django, and NumPy, Pandas, Pyspark, SQLalchemy, Confluent Kafka. ETL : Azure data factory, AWS Glue, Apache Airflow, SSIS, Matillion. Azure Stack : Azure Databricks, Blob storage, Azure functions, HDInsight, Stream Analytics, Event hubs, Logic apps, Virtual machines, Azure service bus, Power BI.
AWS Stack : Aws Athena, S3, Lambda, SNS, EC2, Sagemaker, Kinesis, RDS, EMR, IAM, Code pipeline, Route53, Quick sight, CloudWatch, ECS, Step functions, Elastic search. Databases/Datawarehouse : Sqlserver, Mysql, PostgreSQL, MongoDB, DB2, Oracle, DynamoDB, Cosmos DB, Azure SQL, Cassandra, Azure Synapse analytics, Azure data lake gen2, Snowflake, Amazon Redshift.
CI-CD/Tools : Docker, Azure DevOps, AWS code pipeline, Azure Kubernetes, Jenkins, Terraform Version Control Tools : Git, GitHub, GitLab, Jira
Operating Systems : Windows, Linux
WORK EXPERIENCE
• Data Engineer - I Jan 2024 - Dec 2024
Kroger Cincinnati, OH
● Built AWS Glue workflows to handle data ingestion and successfully migrated all tables from an on-premises SQL Server database to Amazon S3 with minimal downtime.
● Used Apache Spark on Amazon EMR to process large datasets up to 8TB, improving processing speed by 30% and making data available faster for reporting.
● Connected Amazon QuickSight to Amazon RDS (MySQL/PostgreSQL) to create dashboards that effectively visualized and analyzed key business metrics.
● Set up AWS Lambda and Amazon CloudWatch with AWS Step Functions to monitor data pipelines in real-time, cutting down issue resolution time by 35%.
● Scheduled workflows using Apache Airflow on Amazon MWAA, ensuring smooth data transformation and resolving 100% of issues by checking logs and fixing errors.
● Designed Snowflake-style modeling in Amazon Redshift, improving data quality by 12%, handling slowly changing dimensions, and enhancing data validation processes. Technology Stack: AWS Glue, Amazon S3, SQL, Amazon EMR, Amazon RDS (MySQL/PostgreSQL), Amazon QuickSight, Snowflake-style Modeling, Amazon Redshift, AWS Lambda, Amazon CloudWatch, IAM.
• Data Engineer - I July 2022 - Aug 2023
Express Scripts Hyderabad, India
● Wrote SQL Server stored procedures, functions, views, and triggers using SQL and T-SQL to improve database performance and functionality.
● Built ETL pipelines for data integration in both cloud and on-premises environments using Azure Data Factory and SSIS packages.
● Created Spark applications using PySpark and Spark-SQL, improving data extraction and aggregation by 30% across multiple file formats like JSON, Parquet, Avro, CSV, and XML.
● Improved Azure Data Factory pipeline deployments by implementing CI/CD processes with Azure DevOps, Kubernetes, and Docker, leading to a 50% increase in production release efficiency.
● Used Agile methodology and Azure Boards for sprint planning, ensuring tasks were completed on time and coordinating with a team of 4 members and other teams.
● Migrated data from SQL Server, DB2, Oracle, and MySQL to Azure Synapse Analytics and Azure Data Lake, cutting data latency by 30%.
Technology Stack: SQL Server, T-SQL, Azure Data Factory, SSIS, PySpark, Spark-SQL, JSON, Parquet, Avro, CSV, XML, Azure DevOps, Kubernetes, Docker, Azure Synapse Analytics, Azure Data Lake, DB2, Oracle, MySQL, Agile Methodology, Azure Boards.
• Data Engineer - Intern Aug 2021 - June 2022
Happiest minds Bangalore, India
● Assisted in building ETL pipelines using tools like Talend and SSIS to extract, transform, and load data from various source systems into a central data warehouse.
● Supported the migration of on-premises data to cloud-based storage solutions, ensuring data accuracy and minimal downtime during the process.
● Developed SQL queries and scripts to perform data transformations, validations, and aggregations for reporting and analytics purposes.
● Conducted data profiling and quality checks to identify inconsistencies, duplicates, and missing data, improving overall data reliability.
Technology Stack: Talend, SSIS, SQL, Data Warehousing, Data Transformation, Data Validation, Data Profiling, ETL, Cross-functional Collaboration.
Education
• University of Central Missouri Aug 2023 - May 2025 Master of Science, Computer Engineering [GPA: 3.7/4]Warrensburg,MO Teaching Experience
CSE 983 – Web semantic Jan 2024 - May 2024
Assisted instructor in teaching Computer Systems course by guiding during lab sessions, grading assignments, holding office hours, and facilitating student discussions.