Data Engineer Machine Learning

Location:

Somerset, NJ

Posted:

September 23, 2025

Contact this candidate

Resume:

Laya Sree Akula

Data Engineer

Mail: **************@*****.*** Phone: 445-***-****

linkedin.com/in/laya-akula-1730a8371

PROFESSIONAL SUMMARY:

Results-driven Data Engineer with over 4+ years of hands-on experience in designing, building, and optimizing large-scale data pipelines, data warehouses, and cloud-based data solutions. Proficient in leveraging modern technologies such as Python, SQL, PySpark, and ETL frameworks to manage structured and unstructured data across diverse domains. Adept at implementing scalable solutions that support analytics, business intelligence, and advanced machine learning use cases.

Experienced in working with cloud platforms including AWS, Azure, and Snowflake, with strong expertise in data lake architecture, distributed computing, and storage-compute optimization. Skilled in orchestrating workflows using tools such as Airflow and DBT, ensuring efficient data processing, transformation, and quality validation. Demonstrated ability to optimize query performance, reduce costs, and enforce data governance through role-based access controls, encryption, and compliance frameworks.

Strong background in collaborating with cross-functional teams including data scientists, analysts, and business stakeholders to translate business requirements into robust technical solutions. Adept at troubleshooting complex data issues, implementing monitoring dashboards, and ensuring system reliability.

Passionate about delivering clean, reliable, and actionable data while continuously improving processes through automation and best practices. Seeking to leverage technical expertise and problem-solving skills to drive data-driven decision-making and business growth.

Work Experience:

Role: Data Engineer Jan 2024 - Present

Client: Global Atlantic Financial Group, NY

Designed and implemented scalable ETL/ELT pipelines using Informatica, Talend, Apache NiFi, and DBT, ensuring efficient ingestion, transformation, and delivery of structured and unstructured data across multi-cloud platforms (AWS, Azure, GCP).

Leveraged Apache Spark, PySpark, and SQL for large-scale data transformations, optimizing jobs on AWS EMR, Azure Databricks, and GCP Dataproc, reducing processing times by 40%.

Built and managed real-time data streaming pipelines using Kafka, Kinesis, and Pub/Sub, enabling event-driven architectures for fraud detection, risk modeling, and financial reporting.

Implemented Snowflake, Redshift, BigQuery, and Azure Synapse for enterprise data warehousing, designing Star and Snowflake schemas and optimizing query performance with partitioning, clustering, and materialized views.

Architected and deployed Data Lakehouse solutions using Delta Lake, Apache Iceberg, Hudi, and cloud object storage (AWS S3, ADLS, GCS) for unified batch and streaming data processing.

Designed and automated CI/CD pipelines for data workflows using Jenkins, GitHub Actions, and GitLab CI, integrating with Terraform/CloudFormation for Infrastructure as Code.

Developed and deployed containerized microservices using Docker and Kubernetes, supporting modularized data processing and orchestration across hybrid cloud environments.

Integrated multiple RDBMS (Postgres, MySQL, Oracle) and NoSQL (MongoDB, Cassandra, DynamoDB) sources into enterprise data platforms, enabling cross-domain analytics and reporting.

Ensured data quality, governance, and compliance by implementing data validation, cleansing, profiling, lineage, and cataloging with tools like Collibra and Alation, enforcing RBAC, encryption, GDPR, and HIPAA standards.

Orchestrated Apache Airflow DAGs for batch and streaming pipelines, integrating with Event Hub, Kinesis, and Kafka for reliable data scheduling, monitoring, and recovery strategies.

Built Python and Scala-based APIs for data ingestion and migration, integrating with internal and third-party systems to improve data accessibility for analytics teams.

Designed and optimized dimensional models (fact/dimension tables) for financial transaction data, supporting BI dashboards and advanced analytics in Power BI and Tableau.

Implemented real-time monitoring and alerting for critical data pipelines using Prometheus, Grafana, CloudWatch, and Azure Monitor, reducing downtime and failure rates by 30%.

Led cross-functional discussions in Agile/Scrum teams, contributing to sprint planning, backlog grooming, and retrospectives while documenting best practices and creating knowledge-sharing playbooks for new team members.

Conducted performance tuning and database optimization across Aurora SQL, Redshift, PostgreSQL, reducing query latency by 40%+.

Built and scheduled SQL Server Agent jobs (SSMS) for nightly ETL/stored procedures; replaced ad-hoc scripts and eliminated ~90% of manual runs.

Engineered secure ETL pipelines to process financial datasets including mutual fund transactions, PII, and investment performance data, ensuring compliance with data governance and privacy standards while maintaining accuracy and integrity for downstream analytics.

Designed and implemented ETL pipelines in Databricks with PySpark, integrating APIs and external services to enrich supply chain and financial datasets.

Collaborated on cloud migration and consolidation initiatives, optimizing pipelines for Azure Databricks and modernizing data platforms.

Role: Azure Data Engineer August 2021 – August 2023

Client: Fujitsu, Chennai

Designed and developed data pipelines using Azure Data Factory (ADF) and Apache NiFi to ingest, transform, and orchestrate data from multiple structured and unstructured sources.

Built and optimized big data solutions leveraging Apache Spark, Hadoop, and Kafka for large-scale batch and real-time data processing.

Implemented data lake architecture on Azure Data Lake Storage Gen2 (ADLS) and Blob Storage, enabling scalable storage and retrieval of raw, curated, and transformed data.

Developed and maintained data models and data warehouses using Azure Synapse Analytics and Azure SQL Data Warehouse to support advanced analytics and reporting.

Created real-time data streaming solutions with Azure Event Hubs, Azure Stream Analytics, and Kafka, supporting low-latency event-driven data pipelines.

Collaborated with data scientists to integrate machine learning models via Azure Machine Learning and managed ML lifecycle with Databricks MLflow.

Designed and deployed ETL/ELT workflows using Informatica, Talend, and ADF pipelines to ensure clean, validated, and optimized data delivery.

Implemented data governance and security using Azure Active Directory (AAD), RBAC, Key Vault, and Azure Purview for compliance, lineage tracking, and access control.

Automated CI/CD pipelines for data workflows with Azure DevOps and GitHub Actions, utilizing ARM templates and Terraform for infrastructure provisioning.

Performed monitoring, performance tuning, and cost optimization using Azure Monitor, Log Analytics, and advanced SQL query optimization to enhance system efficiency.

Role: Data Analyst May 2019 – June 2020

Client: ECIL (Dept. of Atomic Energy, Govt. of India) – Hyderabad, India

Collected, cleaned, and analyzed large datasets from multiple sources to support decision-making for critical projects.

Designed and developed interactive dashboards and reports using Power BI / Tableau / Excel for management and stakeholders.

Performed data validation, cleansing, and quality checks to ensure accuracy and consistency of atomic energy research data.

Conducted trend analysis, forecasting, and statistical modeling to identify patterns and support predictive insights.

Collaborated with scientists, engineers, and business teams to translate complex data requirements into analytical solutions.

Developed and optimized SQL queries, stored procedures, and scripts for efficient data extraction and reporting.

Worked with structured and unstructured datasets from RDBMS (Oracle, SQL Server, PostgreSQL) and flat files.

Supported ETL processes by preparing source-to-target mappings and validating transformed data in the warehouse.

Delivered ad-hoc analysis to answer urgent business questions and provide real-time decision support.

Documented data workflows, reporting logic, and methodologies to maintain transparency and knowledge sharing.

Implemented data governance practices, ensuring compliance with confidentiality and security protocols in Govt. of India projects.

Automated recurring reports and analysis tasks using Python, R, and Excel macros to save manual effort.

Provided KPI and performance metrics reports to track efficiency, resource utilization, and operational outcomes.

Partnered with IT teams to ensure data integration and migration during system upgrades and enhancements.

Presented analytical insights and recommendations to senior management for policy-making and operational improvements.

Technical Skills:

Programming Languages: Python, Java, Scala, SQL, R

Big Data Frameworks: Apache Spark, Hadoop, Kafka, Flink

ETL / ELT Tools: Apache NiFi, Informatica, Talend, Azure Data Factory (ADF)

Data Warehousing: Snowflake, Redshift, BigQuery, Azure Synapse Analytics

Databases: MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, DynamoDB

Data Lake & Storage: AWS S3, Azure Data Lake Storage, HDFS

Cloud Platforms: AWS, Azure, Google Cloud Platform (GCP)

Data Modeling: Star Schema, Snowflake Schema, Normalization, Denormalization

Data Pipeline Orchestration: Apache Airflow, Prefect, Luigi

Containerization & DevOps: Docker, Kubernetes, Jenkins, Git

Streaming & Real-time Processing: Apache Kafka, Spark Streaming, AWS Kinesis

Data Quality & Governance: Data validation, profiling, metadata management

Performance Tuning & Optimization: Query optimization, indexing, partitioning

Education:

Completed masters in IT —- Virgina university of science and technology - 2025

Completed bachelors in computer science — Samskruthi college of engineering and technology - 2021

Certifications:

Microsoft Certified Azure Data Engineer Associate – DP 203

Databricks Certified Data Engineer Professional

Contact this candidate