Data Engineer Big

Location:

Denton, TX, 76201

Salary:

75000

Posted:

September 15, 2025

Contact this candidate

Resume:

Vijayalakshmi Talla

959-***-**** *************.*@**************.*** linkedin.com/in/vijayalakshmi-talla SUMMARY

Proactive and detail-oriented Data Engineer with 3 years of experience in building scalable, cloud-based data pipelines and analytics solutions using AWS and Azure. Skilled in designing ETL workflows, optimizing big data processing, and developing distributed systems to deliver accurate, high-quality data for reporting, business intelligence, and machine learning. Strong track record of collaborating with cross-functional teams to implement cloud-native, reliable, and performance-driven data infrastructure that enables data-driven decision-making. Experienced with Databricks for big data engineering and delivering actionable insights through interactive dashboards and data visualization. EXPERIENCE

Capital One Financial Sep 2024 – Present

Data Engineer Texas, USA

Spearheaded the migration of legacy ETL pipelines to a cloud-native architecture on AWS using Glue, Apache NiFi, and Data Build Tool

(DBT), reducing pipeline execution time by 30% and improving data refresh SLA compliance to 99.5%.

Designed and deployed scalable batch and streaming data pipelines using Apache Spark, PySpark, and Databricks, processing over 15 TB of transactional banking data daily, enabling near real-time analytics and fraud detection.

Integrated Generative AI models (GPT-4 via OpenAI API and Hugging Face Transformers) into data engineering workflows to automate metadata documentation, SQL query generation, and data quality checks, accelerating analytics delivery for financial business units.

Optimized data storage and query performance by implementing Athena, Redshift, and Snowflake solutions, which improved query latency by 40% and lowered compute costs by 20% through partitioning and data lifecycle management.

Automated workflows and improved orchestration reliability by configuring Apache Airflow DAGs for over 50+ critical pipelines, achieving zero missed SLAs for monthly financial reporting processes.

Built secure, governed data assets and maintained regulatory compliance in the financial domain by integrating Collibra for metadata management and data lineage tracking, resulting in a 100% audit pass rate.

Collaborated with risk, compliance, and BI teams to deliver Tableau dashboards and financial KPIs, enabling executives to gain actionable insights on credit risk, customer churn, and portfolio health.

Engineered and tuned Elasticsearch clusters for log analytics and operational monitoring, increasing visibility into pipeline performance and reducing mean time to resolution (MTTR) by 35% for data incidents. Zensar Technologies Jun 2021 – Jun 2023

Data Engineer India

Delivered a high-throughput stream processing solution using Apache Flink, enabling real-time analytics of transactional feeds with sub- second latency, improving operational decision-making for retail banking clients.

Created and implemented data pipelines on Azure Data Lake and Blob Storage, efficiently managing over 10 TB of daily ingested data, while ensuring scalability and compliance with client data retention policies.

Orchestrated complex ETL workflows using Talend and Azure Data Factory, achieving a 45% reduction in processing time by optimizing transformations and leveraging parallelization strategies.

Enhanced system reliability by configuring and maintaining Apache Zookeeper for distributed job coordination, ensuring high availability and fault tolerance across the data ecosystem.

Enabled business units to uncover actionable insights by building intuitive Power BI dashboards, visualizing KPIs such as customer engagement metrics and churn trends, leading to a 15% improvement in retention initiatives.

Consolidated disparate data sources into a centralized Azure SQL Database and Synapse warehouse, improving query performance for analytical workloads by 35% and streamlining data access across departments.

Applied rigorous data governance practices to enforce data quality and consistency standards, reducing data discrepancies by over 90% and ensuring the trustworthiness of analytics outputs.

Automated repetitive data validation and cleansing tasks by developing custom Python scripts, reducing manual effort by 40% and ensuring consistent data quality across streaming and batch pipelines. SKILLS

Programming & Queryed Languages: Python, SQL, NoSQL Big Data Technologies: Apache Spark, Apache Flink, Hadoop, Hive, Impala, HDFS, Zookeeper, Databricks, PySpark, Git, Terraform Artificial Intelligence & Generative AI: OpenAI GPT (ChatGPT, GPT-4), Hugging Face Transformers, LangChain, LLM Cloud Platforms & Services: AWS (EMR, EC2, S3, Athena, Elasticsearch, Lambda, Redshift, Kinesis), Azure (Data Lake, Data Storage, Azure SQL Database, Azure Blob Storage, Azure Synapse, Azure Stream Analytics), GCP (BigQuery) Data Warehousing & ETL: Snowflake, SSIS, Glue, Talend, Apache NiFi, DBT (Data Build Tool), Airflow, Azure Data Factory, Collibra Databases: PostgreSQL, MS SQL Server, MongoDB, MySQL, DynamoDB, Apache Cassandra Data Visualization & BI Tools: Power BI, Tableau, QuickSight, SSRS Machine Learning & Analytics: Pandas, NumPy, Scikit-learn, TensorFlow, Matplotlib, Seaborn Certifications: SQL Beginner to Advanced For Data Professional, Power BI Data Analytics for All Levels 3.0, Databricks – Master Azure Databricks For Data Engineers, Data Management Masterclass – The Complete Course, Generative AI Leader Certification EDUCATION

Master of Science in Computer Science University of North Texas, Denton, Texas, USA May 2025

Contact this candidate