Vennela B Data Engineer
Mail: ***********@*****.*** Mobile: +1-314-***-**** LinkedIn
PROFESSIONAL SUMMARY
Data Engineer with 6 years of experience designing and optimizing data pipelines, ETL workflows, and large-scale data architectures across IT, Healthcare, and Banking domains. Skilled in integrating complex data sources into cloud platforms, enabling advanced analytics and actionable business insights. Adept at leveraging modern tools such as Apache Spark, Python, SQL, and cloud-native services (AWS, Azure, GCP) to deliver scalable, secure, and high-performing solutions. Recognized for collaborating with cross-functional teams, streamlining processes, and ensuring data quality, while driving measurable outcomes that support both technical and business goals.
TECHNICAL SKILLS
Programming & Scripting: Python, SQL, Java, Scala, Shell Scripting
Big Data & Processing: Apache Spark, PySpark, Hadoop, Hive, Kafka, Airflow
Databases & Warehousing: Snowflake, Redshift, BigQuery, SQL Server, Oracle, MySQL, PostgreSQL
Cloud Platforms: AWS (S3, EMR, Glue, Redshift, RDS, QuickSight), Azure (Data Lake, Data Factory, Databricks, Synapse), GCP (BigQuery, Dataflow, Pub/Sub)
Data Modeling & ETL Tools: Informatica, Talend, dbt, Delta Lake
Visualization & Reporting: Power BI, Tableau, QuickSight
DevOps & Version Control: Git, Azure DevOps, Jenkins, Docker, Kubernetes
Other Tools: Great Expectations, REST APIs, Agile/Scrum
PROFESSIONAL EXPERIENCE
Purevisitx Austin, TX
Data Engineer June 2024 - Present
Built scalable ETL pipelines using PySpark and Apache Airflow, reducing data processing time by 30% and ensuring reliable ingestion from 15+ diverse sources.
Designed and implemented a data lake architecture on Azure Data Lake Storage (ADLS), enabling secure storage, structured zones, and faster data retrieval for analytics teams.
Automated end-to-end data ingestion pipelines with Azure Data Factory (ADF), reducing manual intervention and saving 40+ hours per month.
Developed and deployed REST APIs to provide standardized access to critical datasets, improving integration across enterprise applications.
Tuned complex SQL queries and optimized Spark jobs, resulting in a 25% performance improvement for analytical workloads.
Partnered with data scientists to operationalize machine learning models on Azure Databricks, streamlining deployment and monitoring.
Established CI/CD pipelines using Azure DevOps, reducing release times by 35% and ensuring consistency across environments.
Created Power BI dashboards for pipeline monitoring and business reporting, providing real-time visibility into data flow and health.
Implemented schema evolution and versioning in Hive and Delta Lake, ensuring compatibility and smooth migration during platform upgrades.
Conducted root cause analysis of pipeline issues and built proactive alerts, achieving 99.5% pipeline reliability in production systems.
GE HealthCare Bangalore, Karnataka
Data Engineer Jan 2020 – July 2022
Developed robust ETL workflows using AWS Glue and Python, enabling seamless integration of patient and clinical data from multiple EHR systems.
Optimized large-scale batch processing on AWS EMR with Apache Spark, reducing compute costs by 20% while accelerating throughput for analytics teams.
Designed dimensional and star-schema data models in Amazon Redshift, delivering 40% faster query performance for clinical and operational reporting.
Built real-time data streaming pipelines with Apache Kafka, capturing HL7/FHIR messages from hospital systems for immediate downstream consumption.
Automated data validation and quality checks with Great Expectations, improving regulatory and analytical reporting accuracy by 30%.
Integrated data visualization using Tableau and Amazon QuickSight, empowering healthcare executives with self-service dashboards for performance tracking.
Migrated on-premises SQL Server workloads to AWS RDS, enhancing high availability and lowering operational costs by 15%.
Partnered with compliance and data governance teams to establish security standards and metadata management within AWS Glue Data Catalog.
Designed and deployed REST APIs to securely share standardized healthcare datasets with third-party systems, ensuring interoperability and faster partner onboarding.
Delivered a patient analytics dashboard leveraging AWS Redshift and QuickSight, reducing hospital readmission rates by 12% through improved insights.
IndusInd Bank Mumbai, India
Data Engineer Oct 2017 – Dec 2019
Developed ETL workflows with Informatica and Python, processing high-volume financial transactions and ensuring accurate daily data integration.
Built real-time fraud detection pipelines using Apache Kafka and Spark Streaming, reducing fraud incidents by 15% through proactive monitoring.
Designed and maintained secure data marts in Snowflake, enabling faster risk assessment and high-performance analytics for compliance teams.
Automated daily reporting using SQL and Power BI, reducing manual reporting effort by 50% and improving decision-making speed.
Migrated critical financial workloads to Google Cloud BigQuery, enhancing query performance, scalability, and reducing storage costs.
Applied data encryption, masking, and partitioning policies in BigQuery, ensuring compliance with PCI DSS standards.
Partnered with auditors to streamline data lineage documentation, cutting audit preparation time by 35%.
Integrated REST APIs with payment gateways, improving reconciliation accuracy and reducing settlement discrepancies.
Designed workflow orchestration with Apache Airflow, improving pipeline transparency and reducing failures by 20%.
Collaborated with business stakeholders to deliver customer segmentation models on BigQuery ML, enabling targeted marketing campaigns and improving engagement.
EDUCATION
Masters in Management Information Systems – Oklahoma State University