Data Engineer Factory

Location:

Bentonville, AR

Salary:

60000

Posted:

September 10, 2025

Contact this candidate

Resume:

Ahalya Reddy Choda

+1-913-***-**** *************@*****.*** https://www.linkedin.com/in/ahalya-choda/ Professional Summary

Certified Data Engineer with 4+ years of experience designing and managing scalable data pipelines. Expertise in Python, ETL processes, and Azure services including Data Factory and DataBricks. Proven track record in optimizing workflows and implementing data governance frameworks to elevate data quality and efficiency. Poised to leverage these skills to drive innovative solutions in data engineering.

Technical Skills

• Programming & Scripting: Python, SQL, T-SQL, PL/SQL, Java, Scala

• ETL & Data Integration: Apache NiFi, SSIS, Talend, Informatica, Azure Data Factory

• Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics

• Big Data Technologies: Apache Hadoop, Spark, Hive, HBase, Pig

• Real-Time Data Streaming: Apache Kafka, Spark Streaming, Apache Flink

• Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, EMR), Azure (Data Factory, Synapse, Data Lake, Databricks), Google Cloud Platform (BigQuery, Dataflow, Pub/Sub), Azure Virtual Machines

• Orchestration & Workflow Management: Apache Airflow, Luigi, Prefect, Oozie

• Databases: PostgreSQL, MySQL, SQL Server, Oracle, MongoDB, Cassandra, DynamoDB

• Containerization & CI/CD: Docker, Kubernetes, Jenkins, GitLab CI/CD, Terraform, Cloud Formation

• Monitoring & Logging: Prometheus, Grafana, ELK Stack, Datadog, AWS CloudWatch

• Data Governance & Security: Azure Purview, Apache Ranger, Apache Atlas, RBAC, data encryption, masking, GDPR compliance, Data Quality

• Methodologies & Collaboration: Agile/Scrum, cross-functional team collaboration, technical documentation, stakeholder commu- nication

• Industry Expertise: Financial, Banking Industry

Professional Experience

Invista Aug 2024 - Present

Data Engineer Kansas, USA

• Revamped data pipeline error handling within the IRIS environment, mitigating data loss incidents to zero and achieving a consistent data integrity score of 100%.

• Collaborated with the L2 operations team to provide technical expertise, troubleshoot complex incidents, and accelerate resolution times.

• Enhanced data processing performance through advanced SQL query optimization, indexing, partitioning, and in-memory processing, resulting in faster data retrieval and increased system efficiency.

• Accelerated data insights by constructing and refining ETL pipelines with Spark and Kafka, cutting query times by 40% while managing petabytes of data on a Hadoop cluster; engineered scalable, cloud-based data infrastructure leveraging AWS, Azure, and Google Cloud Platform.

• Automated recurring data workflows using Python and Bash, and orchestrated tasks with Apache Airflow and Luigi, reducing manual effort and increasing operational efficiency.

• Coordinated with data scientists, analysts, and business stakeholders to gather requirements and deliver robust data solutions, designing data architectures across on-premise, cloud, and hybrid platforms.

• Managed version-controlled updates to data pipelines, scripts, and infrastructure using Git, ensuring traceability and effective collab- oration.

• Designed and implemented scalable ETL pipelines using Azure Synapse and Azure Data Factory, boosting data processing efficiency by 30% while emphasizing Python-driven ETL processes.

• Authored and maintained thorough documentation for ETL pipelines, data models, workflows, and system configurations.

• Built robust data pipelines using Python and SQL, integrating key programming skills and diversified languages such as DAX, Java, Scala, and Go.

• Forecasted storage and processing requirements to align infrastructure with future scalability needs, reducing operational costs.

• Developed real-time data streaming solutions using Azure Stream Analytics, enabling seamless integration into Azure Synapse for immediate insights.

• Explored and wrangled heterogeneous data to model datasets, enhancing accuracy in customer experience analysis and supporting strategic decision-making.

• Fine-tuned and debugged production environments within the IRIS platform, ensuring optimal system performance and stability.

• Developed and migrated data to cloud platforms including AWS, Azure, and Google Cloud, improving data accessibility and scalability.

• Created and maintained RESTful APIs to streamline data integration and connectivity between systems, thereby enhancing overall data processing efficiency.

Thyrocare Technologies Ltd Jan 2022 - May 2023

Data Engineer Mumbai, India

• Analyzed, managed, and reported on fixed income securities such as government and corporate bonds, municipal bonds, and mortgage-backed securities, improving accuracy in financial reporting.

• Developed systems for trading, risk management, and reporting of derivatives, including options, futures, forwards, and swaps, thereby enhancing trading efficiency and risk assessment.

• Monitored and optimized costs associated with data storage and processing, reducing overall expenses while ensuring system performance.

• Collaborated with healthcare providers and IT teams to ensure regulatory compliance and interoperability in data exchange processes, strengthening data security.

• Utilized Git for version control and collaboration through branching strategies, pull requests, and code reviews to maintain smooth development workflows.

• Built and managed data Lakehouse architectures using Databricks and Delta Lake, demonstrating practical expertise in DataBricks for efficient data integration and consistency.

• Integrated CI/CD pipelines with Docker and Kubernetes, enabling continuous deployment of microservices and cloud-native applica- tions.

• Integrated data from diverse sources, including databases, APIs, and third-party services, into a unified data warehouse using Apache NiFi to improve data accessibility and analysis.

• Implemented data validation and cleaning processes with Apache Airflow to ensure data accuracy and reliability, thereby reducing errors in analysis.

• Ensured data infrastructure scalability using Google BigQuery to support growing data volumes and enable data-driven decision-mak- ing.

• Troubleshot MLOps process interruptions and communicated resolutions with business partners, working closely with data scientists to support advanced analytics and machine learning models.

• Designed and implemented SDA and DTL models for seamless data conversion between HL7, CCDA, and FHIR formats.

• Implemented robust security measures including encryption and access controls to protect sensitive data and ensure regulatory compliance.

• Monitored data pipelines and infrastructure for performance issues, ensuring timely troubleshooting and minimal downtime.

• Maintained comprehensive documentation of data workflows, pipeline configurations, and infrastructure architecture for efficient knowledge transfer.

• Leveraged version control systems to manage changes in data pipelines and infrastructure configurations, reducing deployment errors and enhancing team collaboration.

• Assessed and recommended data engineering tools and technologies, resulting in a 20% reduction in processing time.

• Developed and managed ETL jobs using Apache NiFi and Apache Airflow to automate data processing tasks, ensuring timely project delivery.

• Utilized big data technologies such as Hadoop, Spark, and Kafka to efficiently handle large-scale data processing and generate actionable insights.

Ingredion Inc. Jun 2020 - Dec 2021

Data Engineer Mumbai, India

• Designed ETL performance tracking sheets across different project phases, enhancing the production team’s ability to monitor and optimize performance.

• Documented Docker and Kubernetes configurations, deployment patterns, and best practices, and provided team training for effective usage and maintenance.

• Collaborated with business stakeholders to understand data requirements and designed data models in Azure Synapse Analytics for efficient storage and retrieval.

• Configured Azure Data Factory Data Flows for data transformation, cleaning, and enrichment, improving data readiness for analysis while adhering to data governance principles.

• Automated daily processes via shell scripting, reducing manual effort and increasing operational efficiency.

• Refactored complex workflows into Azure Databricks notebooks with PySpark and Pandas, developed data quality rules, implemented end-to-end transformation logic, and created Azure Data Factory pipelines for comprehensive orchestration.

• Designed and developed automated applications for generating reports, dashboards, and visualizations, thereby enhancing deci- sion-making speed and accuracy.

• Developed data engineering pipelines using Azure Data Factory (ADF) to ingest data from sources such as AWS S3, SQL Server, Oracle, and REST APIs, emphasizing robust ETL processes.

• Created visualization reports and dashboards using Power BI, enabling stakeholders to gain actionable insights.

• Engineered Producer API and Consumer API to facilitate real-time event stream processing and communication.

• Loaded and transformed structured, semi-structured, and unstructured data and executed Hive queries to improve business data accessibility.

• Prepared detailed project specifications to guide program development and align efforts with business requirements.

• Developed customized read/write utility functions in Python for Snowflake, facilitating seamless data transfer from Azure Data Lake Storage to Azure Synapse Analytics.

• Implemented ETL processes in Azure Data Factory to migrate campaign data from external sources like Azure Data Lake Storage, Parquet files, and text files into Azure Synapse Analytics, ensuring improved data availability.

• Optimized SQL performance by writing complex queries for databases such as Azure SQL Database and Azure Synapse Analytics, enhancing database efficiency and execution times.

• Created diverse data visualizations using Python and Tableau, which improved data insights and support for decision-making processes.

• Utilized Azure Data Lake Store and Azure Blob Storage to securely store raw and processed data, ensuring durability and accessibility.

• Integrated Docker and Kubernetes into CI/CD pipelines for automating the build, testing, and deployment of data applications.

• Developed automated reporting and visualization applications using Azure Synapse Analytics, Power BI, and Tableau, leading to streamlined reporting and enhanced data accessibility. Education

University of Central Missouri

Master's, computer science

Certifications

• Azure Certified Data Engineer

• AWS Certified Solutions Architect.

Contact this candidate