Teja Sri Sai Garikipati
Jersey City, New Jersey
Email: *******************@*****.*** Phone: 660-***-****
Data Engineer LinkedIn: www.linkedin.com/in/garikipatiteja
Professional Summary:
Results-oriented Data Engineer with 4+ years of experience designing, developing, and maintaining scalable data pipelines and real-time analytics solutions in cloud and big data environments. Proficient in Python, SQL, and Spark, with hands-on experience using AWS (Glue, Redshift, S3, Lambda) and exposure to GCP (Big Query, Dataflow) for data processing and analytics. Skilled in building automated ETL/ELT workflows using Hadoop, Apache Airflow, and Azure Data Factory (ADF) to orchestrate complex data transformations across structured and semi-structured data sources. Experienced in managing data lakes and warehouses, including Snowflake and Redshift, enabling BI dashboards with Tableau/Power BI, and preparing curated datasets to support AI/ML pipelines in SageMaker and Databricks. Adept at working in Agile teams, using Git, Jira, and CI/CD tools (Jenkins, Docker) to deliver high-quality, production-ready data solutions, with the ability to quickly adapt to new enterprise data models such as Workday.
Technical Skills:
Programming & Scripting: Python, SQL, Bash
Data Engineering: Apache Spark, PySpark, AWS Glue, AWS Lambda, Apache Airflow, Databricks
Cloud Platforms: AWS (S3, Redshift, EC2, Glue, EMR, Kinesis, CloudWatch, Step Functions), GCP (BigQuery, Dataflow – basic), Azure Data Factory
Big Data & Distributed Systems: Apache Hadoop, HDFS, Hive, Spark, Databricks.
AI/ML Tools: AWS SageMaker, Databricks ML (data preparation for ML pipelines.
Databases: MySQL, MongoDB, Oracle, Snowflake
Data Warehousing: Amazon Redshift, Delta Lake, Snowflake (basic)
Visualization Tools: Tableau, Power BI
DevOps & CI/CD: Docker, Jenkins, Git, GitHub
Education:
Master of Science in Computer Science GPA: 3.6 Aug 2023 – Dec 2024
Bachelors in Mechanical engineering GPA: 3.8 July 2017 – April 2021
Work Experience:
Data Engineer Jan 2024- Present
JP Morgan Chase, New Jersey, USA
Designed and maintained ETL/ELT pipelines using PySpark, AWS Glue, and SQL to process large-scale financial datasets into Amazon S3, Redshift, and Delta Lake.
Built batch and streaming pipelines using Apache Airflow and Kinesis Data Streams to improve data freshness for analytics and regulatory reporting
Optimized Spark jobs on AWS EMR using partitioning, caching, and DataFrame API, reducing job latency by 30%.
Migrated datasets into Snowflake and optimized queries for analytics dashboards.
Automated deployments and workflow monitoring using GitHub, Jenkins, and CloudWatch, improving incident response and reducing downtime.
Explored GCP BigQuery for proof-of-concept queries on large-scale datasets to evaluate cross-cloud analytics capabilities.
Partnered with data science teams to curate datasets for ML pipelines in SageMaker and Databricks, enabling predictive analytics.
Partnered with data science teams to provision curated datasets for machine learning workflows on SageMaker.
Delivered analytics-ready datasets to Tableau and Power BI dashboards for business insights and compliance tracking.
Developed AWS Lambda and Step Functions to orchestrate event-driven ETL workflows, automating lightweight data processing tasks and pipeline triggers.
Key Achievements:
Migrated 10+ high-priority ETL workflows from on-prem to AWS, improving scalability and reducing runtime by 35%.
Enabled near real-time risk analysis by integrating streaming pipelines with Kinesis and downstream analytics.
Environment: Python, PySpark, SQL, AWS (S3, Glue, Redshift, Lambda, EMR, Kinesis, CloudWatch, Step Functions), Apache Airflow, Databricks, Tableau, Power BI, GitHub, Jenkins, Jira, Agile.
Data Engineer (Associate System Engineer) Aug 2021 – Aug 2023
Tata Consultancy Services, Hyderabad, India
Client: Nike – Apparel & Footwear PLM (USA)
Managed incident tickets and performed root cause analysis to enhance stability of Nike's PLM systems.
Built scalable ETL pipelines using PySpark, AWS Glue, and SQL to transform product lifecycle and logistics data into AWS S3, Redshift, and Hive tables.
Designed real-time streaming workflows using Databricks and Spark Streaming to process PLM system logs and operational metrics.
Developed and optimized SQL transformations to support supply chain dashboards and delivery performance KPIs in Tableau.
Implemented automated monitoring and alerting via AWS CloudWatch, improving pipeline stability and reducing support tickets.
Worked closely with cross-functional Agile teams to deliver features, conduct post-release testing, and maintain technical documentation.
Gained familiarity with ERP-style PLM data models, adaptable to Workday data structures.
Developed orchestration pipelines in Azure Data Factory to integrate multi-source datasets into Snowflake/Redshift for analytics.
Key Achievements:
Automated hub-to-hub logistics data pipelines, improving visibility and cutting manual reporting by 60%
Reduced recurring PLM data issues by 30% through root cause analysis and permanent fixes in ingestion workflows.
Environment: Python, PySpark, Java 8, Spring Boot, AWS S3, Glue, Lambda, Redshift, CloudWatch, SQL, Hive, Databricks, Tableau, Git, JIRA.
Data Analytics Jan 2021 – June 2021
ZS Associates, India
Assisted in developing data ingestion pipelines using Python and PySpark to process and clean multi-source datasets into AWS S3 and Hadoop HDFS.
Assisted in automating ETL workflows using Apache Airflow for batch and streaming data ingestion, improving pipeline reliability.
Processed batch data loads into Hadoop HDFS and created Hive tables to support downstream analytics.
Wrote and optimized SQL queries on Redshift and Oracle databases for data validation and exploratory analysis supporting reporting needs.
Collaborated with the data engineering team to build and test ETL workflows using Apache Airflow, ensuring data pipeline reliability and accuracy.
Academic Projects:
Automated Financial Data Pipeline
Built a scalable pipeline using PySpark and AWS Glue to process daily transaction files from multiple sources.
Orchestrated workflows with Apache Airflow for data validation and error handling.
Customer Behavior Analytics Dashboard
Processed customer data using PySpark and loaded it into Amazon Redshift.
Created Tableau dashboards to visualize churn rates and buying patterns.
Certifications:
Google Data Analytics Professional Certificate – Coursera
Microsoft Certified: Data Analyst Associate (Power BI) – Microsoft
IBM Data Analyst Professional Certificate – Coursera
Certified Analytics Professional (CAP)