Data Engineer Processing

Location:

Hyderabad, Telangana, India

Posted:

October 29, 2025

Contact this candidate

Resume:

Data Engineer

Name : Vineela Boinpalli

Ph.no : 872-***-****

Email id : ************@*****.***

LinkedIn : https://www.linkedin.com/in/vineela-rao-92995026a/

Professional Summary:

•Over 10+ years of experience as a Data Engineer with expertise in designing, analyzing, and developing software applications.

•Expertise in languages such as Python, Java, or Scala is crucial for building efficient data pipelines and automating workflows.

•Proficient in Python, SQL, and PySpark, with hands-on experience in big data processing frameworks like Apache Spark, Kafka, and Airflow to ensure high performance and reliability of data systems.

•Strong background in cloud data platforms such as AWS (Glue, Redshift, S3, EMR), Azure (ADF, Synapse, Databricks), and GCP (Big Query, Dataflow) for data ingestion, transformation, and analytics.

•Skilled in implementing data lake house solutions using Databricks, Snowflake, and Delta Lake, enabling unified batch and streaming data processing for analytics and ML workloads.

•Expertise in data modeling, data warehousing, and schema design, ensuring scalable and optimized structures for both OLTP and OLAP systems.

•Experienced in CI/CD and DataOps pipelines using Git, Jenkins, Docker, Kubernetes, and Terraform, improving deployment efficiency and maintaining version-controlled data workflows.

• Implemented data quality, validation, and lineage frameworks using dbt, Great Expectations, and Apache Atlas, ensuring governance and compliance with GDPR and HIPAA standards.

•Adept in real-time data streaming and event-driven architectures using Kafka, Kinesis, and Flink, supporting low-latency data delivery and analytics.

•Expertise in ETL/ELT pipeline development, leveraging technologies such as Apache NiFi, Apache Airflow, dbt and Talend, Informatic for real-time and batch processing.

• Collaborated with data scientists and BI teams to integrate AI/ML models and advanced analytics into production data pipelines using MLflow, SageMaker, and Databricks ML.

•Strong advocate for Data Mesh, Data Fabric, and modern Lakehouse principles, focusing on scalability, automation, and cross-domain data ownership.

•Proficient in project management and documentation using JIRA, Prometheus, Grafana, and ELK Stack and JS docs.

Technical Skills

•ETL/ELT Pipeline Development: Apache NiFi, Apache Airflow, dbt and Talend, Informatica

•Cloud-Native Data Architecture: AWS S3, Azure Data Lake Storage, GCP Big Query, Redshift

•Real-Time Streaming & Event-Driven Architectures: AWS Kinesis, Apache Kafka, Azure Event, Google Pub/Sub

•Data Governance & Metadata Management: AWS Lake Formation, Azure Purview, GCP Data Catalog

•CI/CD Pipeline Automation: AWS code Pipeline, Azure DevOps, Cloud Build, GitHub Actions

• Containers: Docker, Amazon ECS, Kubernetes (EKS/GKE)

•Programming & Data Processing: Python, PySpark, Scala, SQL

•Data Security & Compliance: HIPAA, GDPR, SOC 2, Encryption (KMS), IAM Policies, Data Masking

•Cloud Data Warehousing & Analytics: Redshift, Azure Synapse, Snowflake, Google Big Query

•Machine Learning Model Deployment: AWS SageMaker, Azure ML, GCP Vertex AI

•API Design & Deployment: AWS API Gateway, Azure RESTful APIs, Google Cloud Functions

•Data Visualization & Dashboards: Amazon Quick Sight, Power BI, Looker, Google Data Studio

Professional Experience

Client: CVS Healthcare, NYC. Jan 2025-Till Date

Role: Data Engineer

Responsibilities:

•Designed and operated near-real-time pipelines for pharmacy ops, claims, Rx adherence, and patient engagement using Spark (Scala/PySpark) on AWS Glue/EMR with Kafka/Kinesis for ingestion. Cut event-to-insight latency from ~30 min <5 min.

•Built a Delta Lake–based lakehouse on Amazon S3 with Glue Catalog (governed schemas, partitioning, Z-Ordering, compaction). Reduced read costs ~28% via partition pruning and file size optimization.

•Advanced knowledge of programming languages such as Python, SQL, Java, and Scala is crucial for developing robust data solutions.

•Optimized big data processing by extensively working with HDFS, MapReduce, and Spark, ensuring high scalability and fault tolerance in distributed computing environments.

•Proficient in ETL processes, utilizing tools like Apache Airflow, dbt, Talend, Informatica for ingesting and processing data from disparate sources.

•Enable AI/ML-driven insights for fan engagement, content performance, and predictive analytics using platforms like SageMaker or Databricks.

•Ensure data quality, lineage, and integrity across distributed systems using Airflow, Delta Lake, and Palantir Foundry (if in use).

•Maintain strict adherence to data governance, privacy regulations, and platform performance SLAs (GDPR, CCPA, etc.).

•Architect and maintain real-time data streaming frameworks (Kafka, Kinesis, Flink) to support live sports event tracking, audience interactions, and in-game analytics.

•Designed schema evolution strategies for semi-structured data using Glue Catalog, Parquet, Avro to ensure flexibility in fast-changing sports metadata pipelines.

•Built and deployed containerized data applications using Docker, Amazon ECS Fargate, and EKS, supporting scalable microservice-based ingestion and transformation pipelines.

•Created secure APIs for data services using AWS API Gateway, Lambda, integrating with Cognito for authentication and Secrets Manager for credential management.

•Developed automated CI/CD pipelines for data jobs using Jenkins, Terraform, and AWS Code Pipeline, ensuring rapid and reliable deployment of production-ready workflows.

•Created monitoring and alerting frameworks for streaming jobs using CloudWatch, Prometheus, and Grafana to ensure operational visibility and minimize downtime during live events.

Environment : AWS Glue, Amazon Redshift, Amazon S3, AWS Lambda, Amazon Kinesis, Apache Spark (Scala/PySpark), Apache Kafka, Flink, Delta Lake, Azure Databricks, Palantir Foundry, Hive, HBase, PostgreSQL, MongoDB, Airflow, SageMaker, Power BI, Tableau, JSON, XML, HL7, Git, JIRA, Agile/Scrum.

Client: Bank of America, Charlotte NC Sept 2022 – Dec 2024

Role: Data Engineer

Responsibilities:

•Designed and built Scala-based ETL pipelines to process financial data, including transactions, loans, mortgages, and credit cards using Azure Data Factory, Databricks, and Palantir Foundry.

•Developed real-time fraud detection and risk analytics pipelines using Spark Structured Streaming, Delta Lake, and Azure Synapse.

•Optimized large-scale data transformations with partitioned Hive tables, HDInsight, and Scala-based SQL/PLSQL scripts for improved performance.

•Ingested structured and semi-structured data (JSON, XML, FIX) into enterprise data warehouses, enabling advanced analytics for regulatory compliance and financial audits.

•Implemented CI/CD pipelines for ETL deployment, ensuring data integrity, version control, and compliance with SOX, AML, and Basel III regulations.

•Leveraged Palantir Foundry for financial data orchestration and AI/ML-based fraud prevention using Azure ML, MLflow, and Spark MLlib.

•Built and maintained real-time transaction ingestion frameworks using Apache Flume, Kafka, and Azure Data Lake to support instant fraud alerts and audit tracking.

•Developed streaming analytics pipelines for anomaly detection in mortgage and loan transactions using custom Spark MLlib models and Azure Stream Analytics.

•Implemented robust data validation, lineage tracking, and audit logging using Delta Lake and Palantir Foundry features to meet internal and regulatory audit standards.

•Built reusable PySpark libraries for data quality checks, transformation utilities, and error handling, reducing development time and improving consistency across pipelines.

•Orchestrated end-to-end financial data workflows using Apache Airflow, Azure DevOps, enabling automated recovery, SLA tracking, and alerting for critical ETL pipelines.

Environment : Azure Data Factory, Azure Databricks, Apache Spark (Scala), PySpark, Azure Synapse Analytics, Azure Data Lake Gen2, Azure Blob Storage, Palantir Foundry, Apache Hive, Apache HBase, Apache Flume, Apache Kafka, HDInsight, Delta Lake, PostgreSQL, SQL Server, PL/SQL, Azure ML, MLflow, Spark MLlib, Git, JIRA, Agile/Scrum

Client: HCA Healthcare Nashville TN Sep 2018 -Aug 2022

Role: Data Engineer

Responsibilities:

•Designed and implemented scalable data pipelines using Apache Beam, Dataflow for real-time and batch processing on Google Cloud Platform (GCP).

•Migrated legacy healthcare data systems to Big Query, optimizing data warehousing solutions for better performance and lower cost.

•Developed ETL workflows using Cloud Composer (Airflow) to orchestrate complex healthcare data ingestion and transformation tasks.

•Built robust data models and dashboards in Looker and Google Data Studio to support clinical and compliance reporting.

•Created data marts for quality improvement initiatives using dbt (data build tool) and Big Query SQL transformations.

•Utilized Pub/Sub for real-time event ingestion from EMR systems and integrated with downstream analytics tools.

•Automated data quality checks and anomaly detection using Python, Great Expectations, and Cloud Functions.

•Developed CI/CD pipelines for data engineering workflows using Cloud Build, Git, and Terraform for infrastructure as code.

•Used GKE (Google Kubernetes Engine) to containerize and deploy scalable machine learning pipelines in coordination with MLOps teams.

•Led the migration of historical data from on-premise SQL Server and Oracle databases to Cloud SQL and Big Query.

•Implemented data versioning and lineage tracking using Data Catalog, Apache Atlas, and Open Lineage.

•Developed event-driven data architecture using Cloud Functions, Pub/Sub, and Cloud Storage for processing HL7 messages.

•Mentored junior data engineers on best practices in data modeling, code review, and GCP architecture.

•Conducted performance tuning and cost optimization strategies using Big Query BI Engine, query analyzers, and Stack driver (now Cloud Operations Suite).

Environment : GCP (Big Query, Cloud Composer, Dataflow, Pub/Sub, Cloud Functions, GKE, Cloud SQL, Cloud Storage, Cloud Build, Data Catalog, Cloud DLP, IAM, Stack driver), Apache Beam, Airflow, Looker, Google Data Studio, Python, SQL, dbt, Terraform, Docker, Kubernetes, Git, JIRA.

Client: WellsFargo, India Oct 2014-Dec 2017

Role: Big Data Engineer

Responsibilities:

•Expertise in Big Data frameworks such as Apache Spark, Hadoop, Hive, HBase, Kafka, and Flink, with deep knowledge of distributed computing, batch, and streaming architectures.

•Proficient in building ETL/ELT pipelines and data ingestion workflows using tools like Apache NiFi, Airflow, and Sqoop, ensuring scalability and high throughput.

•Hands-on experience with cloud data platforms — AWS (EMR, Glue, Redshift, S3), Azure (HDInsight, Synapse, Data Lake, Databricks), and GCP (Dataproc, Dataflow, BigQuery).

•Strong in data modeling, warehousing, and lake house architecture, leveraging Snowflake, Databricks, and Delta Lake to unify analytics and data science workloads.

•Proficient in Python, PySpark, Scala, Java, and advanced SQL, enabling efficient transformation, processing, and optimization of high-volume datasets.

•Experienced in implementing real-time data streaming pipelines using Kafka, Spark Streaming, and Flink, supporting low-latency analytics and event-driven systems.

•Implemented DataOps and CI/CD automation using Git, Jenkins, Docker, Kubernetes, and Terraform, improving pipeline deployment reliability.

•Strong understanding of data governance, lineage, and security using Apache Atlas, Ranger, Great Expectations, ensuring compliance with GDPR and enterprise standards.

•Collaborated with data scientists and analytics teams to operationalize ML pipelines and integrate predictive analytics into data systems.

•Familiar with emerging data technologies like Iceberg, Hudi, and Delta Lake for modern data lake house and incremental data processing.

Environment : Apache Spark, Hadoop, Hive, HBase, Kafka, and Flink, Apache NiFi, Airflow and Sqoop, Python, PySpark, Scala, Java, and advanced SQL, Git, Jenkins, Docker, Kubernetes, and Terraform, Apache Atlas, Ranger.

EDUCATION:

Master of computer Application – JNTUH-2016

Contact this candidate