Vennela Billa Data Engineer
+1-848-***-**** *************@*****.*** LinkedIn
SUMMARY
Analytical and detail-oriented Data Engineer with 4 years of experience turning complex, fragmented data into actionable insights that drive business growth. Known for building robust, scalable pipelines and data platforms that withstand real-world production demands while staying cost-efficient and compliant. Adept at bridging the gap between technical precision and business priorities ensuring data not only flows reliably, but also tells the right story to decision-makers. Recognized for delivering measurable improvements in data accuracy, processing speed, and system resilience in high-stakes, deadline-driven environments. SKILLS
Programming & Query Languages: Python, SQL, Scala, Java Big Data & Orchestration: Apache Spark, Spark Streaming, Apache Kafka, Apache Airflow, Databricks, AWS Glue, Hadoop Cloud & Data Warehousing: AWS (S3, Redshift, IAM, EC2, Athena, EMR, Lambda), Snowflake, Google BigQuery Data Modeling & Governance: Dimensional Modeling, Star Schema Design, Data Warehousing Best Practices, GDPR, CCPA Compliance
DevOps & CI/CD: Jenkins, Terraform, DataOps, CI/CD Pipelines Data Quality & Monitoring: Great Expectations, SQL Assertions, AWS CloudWatch, Prometheus Analytics & Visualization: Tableau, Power BI
AI & Streaming Trends: Real-Time Data Processing, AI/ML Integration, AI Literacy Soft Skills: Problem-Solving, Critical Thinking, Communication, Collaboration, Adaptability EXPERIENCE
Data Engineer CGI Jun 2023 – Present Remote, USA
Partnered with product managers, analysts, and data scientists during sprint planning to clarify reporting needs, translating business requirements into technical specifications for end-to-end ETL workflows leveraging Apache Airflow, Python, and SQL.
Designed and developed scalable ingestion pipelines in Apache Spark and Databricks to pull structured and semi-structured data from APIs, relational databases, and streaming sources like Apache Kafka, ensuring schema consistency and optimal partitioning strategies.
Engineered transformation logic using PySpark and AWS Glue, applying dimensional modeling techniques to create star-schema tables in Snowflake, reducing report generation times by 35%.
Built automated data quality checks with Great Expectations and SQL assertions, integrating them into CI/CD pipelines via Jenkins to block deployments with invalid or incomplete data, improving data accuracy for downstream analytics by 25%.
Deployed pipelines and data models to AWS S3, Redshift, and Athena environments using infrastructure-as-code practices with Terraform, implementing role-based access controls to maintain compliance with GDPR and CCPA.
Established real-time monitoring and alerting via CloudWatch and Prometheus, creating dashboards to visualize latency and throughput metrics, which enabled proactive resolution of 90% of potential pipeline failures before impacting SLAs.
Collaborated in retrospectives and cross-functional syncs to review pipeline performance, identify optimization opportunities, and adopt emerging tooling, leading to a 15% reduction in monthly cloud compute costs without impacting delivery timelines. Associate Data Engineer Accenture Jul 2020 – Apr 2022 India
Collaborated with senior data engineers and business analysts to gather requirements for migration projects, translating them into ETL design specifications using SQL, Python, and AWS Data Wrangler.
Assisted in developing ingestion workflows in Apache Spark and AWS Glue to process flat files, relational database extracts, and JSON data from REST APIs, ensuring adherence to defined schemas and data mapping rules.
Supported transformation logic implementation for staging and dimensional tables in Redshift and PostgreSQL, applying basic normalization and denormalization techniques to optimize for both transactional and analytical workloads.
Performed initial data validation and anomaly detection using SQL queries, PySpark filters, and checksum comparisons, logging issues in Jira for resolution and improving overall load accuracy by 18%.
Participated in test runs of ETL jobs in lower environments, documenting results, and assisting in debugging failed loads by reviewing Spark logs, query plans, and AWS CloudWatch alerts under senior engineer guidance.
Helped deploy production-ready jobs via Git-based version control and Jenkins pipelines, verifying successful execution and updating operational runbooks with new workflows and dependencies.
Monitored daily job schedules, resolved minor pipeline issues, and provided ad-hoc dataset extractions for analysts, contributing to on-time delivery for 95%+ of scheduled jobs over the project duration. EDUCATION
Master of Science in Management Information Systems May 2022 – Dec 2023 Montgomery, AL Auburn University at Montgomery
CERTIFICATIONS
Introduction to Data Science – Cisco
Data Analytics Essentials – Cisco