Data Engineer Machine Learning

Location:

Patna, Bihar, India

Salary:

70000

Posted:

September 10, 2025

Contact this candidate

Resume:

Balusu Niharika

Data Engineer

*****************@*****.*** +1-913-***-**** Overland Park, KS LinkedIn PROFESSIONAL SUMMARY

Results-oriented Associate Data Engineer with over 3 years of experience designing, developing, and optimizing scalable data pipelines and ETL workflows across cloud platforms including AWS, Azure, and GCP. Proven expertise in managing large- scale, real-time datasets using tools like AWS Glue, Lambda, Redshift, S3, Athena, and Palantir Foundry. Adept in data modeling, orchestration (Airflow, Terraform), and building analytics-ready data assets to support machine learning, predictive maintenance, and personalization systems. Strong proficiency in Python, SQL, and PySpark, with a solid track record in data governance, CI/CD automation, and cross-functional collaboration to deliver impactful data solutions. PROFESSIONAL EXPERIENCE

Data engineer Fluence energy Jan 2025- Present

• Designed and deployed scalable ETL pipelines using AWS Glue, Lambda, and Step Functions to orchestrate the processing of over 2TB of energy data daily, improving real-time grid monitoring and reducing pipeline latency by 35%.

• Designed and implemented predictive maintenance pipelines leveraging historical IoT sensor data using AWS S3, Athena, and SageMaker, resulting in a 28% reduction in unplanned equipment downtime.

• Consolidated heterogeneous data sources including IoT telemetry, SCADA logs, and CSV exports into an integrated Amazon Redshift warehouse, enhancing cross-team analytics and reporting accuracy.

• Implemented CI/CD automation for data workflows using Terraform, AWS CodePipeline, and CodeBuild, cutting deployment times by 40% and ensuring consistent infrastructure-as-code practices.

• Partnered with analytics and operations teams to develop data validation rules, Glue Data Catalog integration, and audit logging via CloudWatch and AWS Lake Formation, ensuring data reliability and governance compliance.

Data Engineer Intern AARP May 2024 – Dec 2024

• Designed and implemented scalable, end-to-end data pipelines in Palantir Foundry, automating the transformation and delivery of article model updates, which led to a 40% reduction in manual intervention and increased model refresh efficiency.

• Ingested, validated, and transformed millions of records from third-party APIs containing over 300 complex attributes, significantly improving personalization logic and enhancing API accuracy by 27% across multiple digital products.

• Spearheaded a category normalization and enrichment framework to resolve data duplication, inconsistent casing, and label conflicts, resulting in a cleaner dataset and improved classification for content tagging engines.

• Developed modular ETL workflows with robust logging, exception handling, and monitoring using Foundry’s code repository, supporting fault-tolerant data processing and reliable user recommendation pipelines.

• Engineered a secure, role-based access logic for gated member content while enhancing the event prioritization mechanism, ensuring premium user experiences and streamlined delivery of personalized services. Junior Data Engineer Brane Enterprises Jan 2022 – May 2023

• Developed and automated robust ETL pipelines using Python, Pandas, and SQL to process structured datasets for machine learning model training, reducing turnaround time for data preparation workflows by 40%.

• Identified and resolved critical performance bottlenecks in existing data flows by implementing query optimization techniques and integrating Redis-based caching layers, leading to a 30% decrease in data latency.

• Authored comprehensive API documentation using Swagger, while also enhancing microservice security protocols by introducing token-based authentication and centralized logging for audit compliance.

• Managed and configured cron-based job scheduling and alert mechanisms to maintain continuous data ingestion and logging consistency across CI/CD pipelines, improving system stability and uptime.

• Designed and implemented NLP preprocessing routines using NLTK, including POS tagging, lemmatization, stemming, and frequency analysis, supporting improved input quality for downstream classification models. TECHNICAL SKILLS

• Programming & Scripting: Python, SQL, PySpark, Scala, Shell Scripting

• Data Engineering & ETL Tools: Apache Spark, Apache Kafka, AWS Glue, dbt, Informatica, Azure Data Factory

• Cloud Platforms: AWS (S3, Redshift, EMR, Lambda), Azure (Synapse, Blob Storage, ADF), GCP (Databricks, BigQuery, Pub/Sub, Dataflow)

• Data Warehousing & Databases: Snowflake, Amazon Redshift, Microsoft SQL Server, PostgreSQL, MySQL

• Big Data & Lakehouse Technologies: Delta Lake, Hadoop, HDFS

• Data Modeling & Orchestration: Star/Snowflake Schema, dbt, Airflow, GitHub Actions, Terraform

• Data Visualization & BI Tools: Power BI, Tableau

• DevOps & CI/CD: Git, GitHub Actions, Terraform, Jenkins, Docker

• Compliance & Data Governance: GDPR, CCPA, Data Lineage, Audit Logging

• Machine Learning & Analytics: scikit-learn, Pandas, NumPy, Matplotlib EDUCATION

Master’s in Computer Science, University of Central Missouri Aug 2023 – May2025 Bachelor’s in Computer Science, Neil Gogte Institute of Technology Aug 2019 – Jun 2023 PROJECTS

• Text Translation and Speech Synthesis Platform

• Real-Time Streaming Pipeline

• Algorithm Visualizer Web Application

• Open-Source Contributions

CERTIFICATIONS

• Python Data Structures — Coursera (University of Michigan)

• Google Cloud Professional Data Engineer

• Machine Learning, Data Science, and Deep Learning with Python — Udemy

• AWS Fundamentals: Core Concepts — Coursera

• Advanced SQL for Data Engineers — DataCamp

• Business English Certification Vantage CEFR Level B2

Contact this candidate