Deepakteja Endluri
Data Engineer
Email: ***********@*****.*** Mob: +1-972-***-****
PROFESSIONAL SUMMARY
Detail-oriented and solution-driven Data Engineer with 6.5+ years of experience designing and deploying large-scale ETL pipelines, cloud-based data platforms, and real-time analytics solutions across Information Technology, Consumer Electronics, Banking and Financial Services, Automotive, healthcare, and Insurance domains. Expertise in Python, SQL, PySpark, AWS, Azure, GCP, Snowflake, and modern data architecture principles with a focus on performance, scalability, and data integrity.
TECHNICAL SKILLS
Languages: Python, SQL, Scala, Shell Scripting
Big Data Frameworks: PySpark, Hadoop, Hive
Cloud Platforms:
oAWS: Glue, Redshift, S3, Lambda, Kinesis
oAzure: Data Factory, Synapse, Data Lake, Blob Storage
oGCP: BigQuery, Dataflow, Cloud Storage, Pub/Sub
Data Warehousing: Snowflake, Redshift, Azure Synapse
ETL & Workflow Tools: Apache Airflow, Azure Data Factory, Talend
Databases: SQL Server, PostgreSQL, MongoDB, MySQL
BI & Visualization: Power BI, Tableau
Others: Git, Docker, JIRA, CI/CD
PROFESSIONAL EXPERIENCE
Client: HP Inc - Vancouver, WA
Role: Senior Data Engineer Jun 2023 – Present
Built enterprise-scale data ingestion pipelines using AWS Glue and PySpark to centralize warranty and return data.
Integrated streaming data from IoT-enabled printers and hardware devices into Snowflake for predictive analytics.
Developed modular and reusable PySpark libraries to standardize transformation logic across multiple pipelines.
Established S3-based data lake zones (raw, curated, consumption) with automated archiving and purging policies.
Implemented schema evolution and automated data format conversions (CSV to Parquet) using Glue and Spark.
Deployed Airflow DAGs for orchestrating multi-step workflows with dependencies, retries, and Slack alerts.
Partnered with product analytics teams to create custom datasets for failure pattern analysis and usage tracking.
Tuned Redshift cluster configurations, vacuuming strategies, and sort key usage for optimal performance.
Built CI/CD integration using GitHub Actions to validate, deploy, and rollback data jobs in lower and prod environments.
Created detailed architecture diagrams, data dictionaries, and lineage reports to improve onboarding and audit readiness.
Environment: AWS Glue, Redshift, S3, Python, PySpark, Airflow, Snowflake
Client: BMO - Phoenix, AZ
Role: Data Engineer Apr 2021 – May 2023
Developed scalable data pipelines in Azure Data Factory to ingest financial transactions and third-party feeds into Synapse.
Designed and implemented role-based access policies on Azure Data Lake ensuring compliance with SOX and HIPAA standards.
Built dynamic pipeline templates using ADF parameterization to support reusable workflows across 15+ business units.
Developed Python scripts to automate reconciliation of multi-source datasets and detect data integrity issues.
Implemented robust data quality frameworks that included rules for null checks, outlier detection, and threshold alerts.
Integrated third-party regulatory feeds (FATCA, FINTRAC, AML systems) for reporting and risk analysis.
Created historical datasets to support credit scoring and fraud detection models for risk and compliance teams.
Designed self-service Power BI reports enabling leadership to track customer engagement and loan performance.
Reduced nightly batch processing time by 60% by optimizing stored procedures and partitioning logic in Synapse.
Led incident resolution calls and proactively fixed SLA breaches by implementing fallback and retry logic in pipelines.
Environment: Azure Data Factory, Synapse, Data Lake, Power BI, Python, SQL Server
Client: AutoZone – Memphis, TN
Role: Data Engineer Jan 2018 – Mar 2021
Developed scalable PySpark pipelines to process and enrich customer purchase data and vehicle service logs.
Automated data integration from Google Analytics and vendor APIs into Hive tables using Talend workflows.
Engineered data models to support inventory tracking, in-store product movement, and reorder predictions.
Designed Hive schemas to support fast analytics by leveraging bucketing, partitioning, and ORC compression.
Integrated real-time vehicle part lookup and order status APIs using Spark Streaming and Kafka.
Collaborated with marketing teams to generate user segments and behavioral datasets for targeted campaigns.
Built dashboards in Tableau to support executive reporting on regional performance and customer churn.
Introduced dynamic data masking strategies to comply with PCI DSS and internal data governance standards.
Optimized Spark job performance through join strategy tuning, memory overhead control, and checkpointing.
Mentored a team of 3 junior engineers and conducted weekly code review sessions to drive code quality.
Environment: Hadoop, Hive, PySpark, Talend, MySQL, Tableau
EDUCATION
Master of Science in Computer and Information Science
Southern Arkansas University, Magnolia, AR – 71753 Aug 2016 – Dec 2017