Dinesh Gorripati
****************@*****.*** Data Engineer +1-806-***-**** LinkedIn
Professional Summary:
8+ years of experience in Data Engineering, with deep expertise in big data technologies, cloud platforms, and scalable data solutions.
Proven ability to build and manage ETL workflows and automated data pipelines using tools like Apache Airflow, AWS Glue, and Apache NiFi for large-scale data integration and transformation.
Strong hands-on experience with cloud-based data systems including AWS and Microsoft Azure, ensuring flexibility and performance.
Expertise in designing and implementing data lakes and data warehouses using AWS S3 and Azure Data Lake to support analytical workloads.
Skilled in SQL and managing large-scale relational databases such as PostgreSQL, SQL Server, MySQL, and Amazon Redshift, with a focus on query optimization.
Proficient in big data processing frameworks like Apache Spark, Hadoop, and Flink for both batch and real-time data processing.
Developed real-time streaming pipelines using tools such as Apache Kafka and AWS Kinesis to support real-time analytics and event-driven systems.
Experienced in building machine learning ready data pipelines, enabling model training and predictive analytics through structured, high-quality data.
Adept in containerized environments with Docker and Kubernetes, and implementing CI/CD pipelines using Git, Jenkins, and GitLab for automated deployments.
Experience in data modeling, defining schema and data architecture to support scalable and efficient data systems.
Strong background in data security and governance, including encryption, IAM, RBAC, and compliance-focused data handling.
Experience with NoSQL databases like MongoDB, Cassandra, and DynamoDB, and integrating data from diverse sources such as REST APIs, CSV, JSON, and XML.
Collaborated with cross-functional teams to integrate AI/ML models into production workflows, leveraging Apache Spark and AWS Kinesis to process real-time data and drive actionable insights in operational environments.
Excellent collaboration and communication skills, work closely with data scientists, analysts, and stakeholders to translate business needs into scalable data solutions and provide insights using Power BI, Tableau, and AWS QuickSight.
Technical Skills:
Programming Languages: Python, Java, SQL, Shell scripting
Big Data Technologies: Apache Hadoop, Apache Spark, Apache Kafka, Apache Flink, Apache Hive
Data Processing: Batch Processing, Stream Processing, Real-Time Data Streaming
Cloud Platforms: AWS (S3, EC2, Lambda, Redshift, Kinesis, Glue), Azure (Blob Storage, Data Lake, Data Factory)
Databases: SQL (MySQL, PostgreSQL, SQL Server, Redshift), NoSQL (MongoDB, Cassandra, DynamoDB)
Data Warehouse & Data Lakes: AWS Redshift, Azure Data Lake, AWS S3, Data Lake Architecture
ETL Tools: Apache NiFi, Apache Airflow, AWS Glue, Talend, Informatica
Containerization & Orchestration: Docker, Kubernetes, AWS ECS, AWS EKS
Data Security: AWS IAM, Data Encryption, RBAC (Role-Based Access Control), Data Privacy
Machine Learning: Scikit-learn, TensorFlow (Data Preparation for ML models)
CI/CD & DevOps: Jenkins, GitLab, GitHub, Bitbucket, Terraform, Ansible
Data Integration: REST APIs, SOAP APIs, Data Extraction, Data Transformation
Real-Time Data Processing: AWS Kinesis, Apache Kafka, Apache Flink
Data Monitoring & Logging: AWS CloudWatch, Prometheus, Grafana, ELK Stack
Data Visualization: Power BI, Tableau, AWS QuickSight
Version Control: Git, GitHub, Bitbucket, GitLab
Automation: AWS Lambda, Python Scripting for Automation
Data Governance: Data Lineage, Quality Assurance, Data Cataloging
Data Modeling: Schema Design, Dimensional Modeling, OLAP
Collaboration Tools: JIRA, Confluence, Slack, MS Teams
Professional Experience:
Sr Data Engineer
PNC (Jun 2023 – Present)
Designed and implemented scalable data pipelines for high-volume banking transaction data, integrating AWS Kinesis for real-time streaming and faster processing.
Developed ELT pipelines to load transactional data into Snowflake, enabling seamless access to cleansed, structured data with minimal latency.
Utilized Snowflake Time Travel and Cloning for safe data recovery and version control, ensuring the integrity of critical banking datasets.
Leveraged Snowflake's multi-cluster compute and auto-scaling features to handle variable workloads, optimizing performance for real-time fraud analytics.
Integrated semi-structured data (JSON) from Kinesis streams into Snowflake, using VARIANT columns and Snowflake SQL for efficient downstream processing.
Built a robust data warehouse using AWS Redshift to enable efficient analytics and reporting on banking data.
Integrated NoSQL databases like MongoDB to store and manage customer interaction data, improving data retrieval speed and flexibility.
Integrated machine learning models into real-time data pipelines using Apache Kafka and AWS Lambda, enabling dynamic risk scoring and proactive fraud prevention based on historical and live transaction data.
Optimized complex data pipelines, reducing query execution times by 30% through performance tuning and efficient transformation logic.
Developed automated ETL workflows using Apache Airflow and AWS Glue, streamlining data processing and reducing manual intervention.
Implemented data security measures to ensure compliance with financial regulations, including IAM roles, encryption techniques, and access controls.
Built real-time fraud detection and risk management pipelines using Apache Kafka, enabling proactive transaction monitoring.
Integrated Power BI dashboards with banking data, providing actionable insights for executive decision-making and business strategy.
Delivered a unified customer view by integrating data from multiple internal systems, supporting enhanced marketing and customer engagement strategies.
Implemented automated pipeline monitoring using AWS CloudWatch, ensuring high data reliability and minimizing downtime.
Optimized financial data reconciliation tasks, reducing manual intervention by 40% and improving operational efficiency.
Data Engineer
Kaiser Permanente (Jan 2020 – Dec 2022)
Designed and implemented end-to-end data pipelines using Azure Data Lake Storage, Azure Data Factory, and Azure Synapse Analytics, ensuring seamless data processing and scalable storage for healthcare datasets.
Built and managed data marts in Snowflake for EMR and claims data, enhancing analytical capabilities for healthcare operations and patient trend forecasting.
Implemented row-level security and role-based access control (RBAC) in Snowflake to ensure HIPAA-compliant data access and secure handling of sensitive patient data.
Optimized query performance in Snowflake using materialized views, clustering keys, and data pruning strategies, improving data retrieval speed and efficiency.
Built and optimized data transformation workflows in PySpark, leveraging efficient join strategies, partitioning, and caching to enhance performance and reduce processing time.
Implemented Slowly Changing Dimensions (SCD) Type 1 and Type 2 in Azure Synapse Analytics, enabling accurate historical tracking and reliable reporting.
Developed interactive Power BI dashboards, providing actionable insights into revenue trends, payment timelines, and accounts receivable, empowering data-driven decision-making for healthcare stakeholders.
Streamlined deployment processes by setting up CI/CD pipelines using Azure DevOps and Jenkins, ensuring smooth integration, testing, and automation of data solutions.
Applied machine learning techniques using Azure Machine Learning, TensorFlow, and PyTorch, optimizing models with Scikit-learn and NumPy to drive predictive analytics for patient care and operational forecasting.
Deployed containerized applications using Docker and Azure Kubernetes Service (AKS) to improve system scalability and support real-time data processing.
Designed and deployed data models for processing patient data, making it easier for the healthcare team to analyze trends.
Monitored and optimized data pipeline performance using Azure Monitor, Splunk, and Prometheus, implementing alerts and dashboards to proactively address potential issues and enhance system reliability.
Increased overall data processing speed by 30% through query optimization, efficient ETL workflows, and parallel processing techniques, significantly improving system efficiency and performance.
Jr Data Engineer
Nokia (Jan 2017 – Dec 2019)
Developed scalable data engineering solutions for managing large-scale telecom and IoT sensor data, supporting
Developed scalable data pipelines for telecom and IoT sensor data, optimizing network performance and efficiently handling large-scale traffic.
Migrated legacy Hadoop datasets to Snowflake, improving query speed, scalability, and cost efficiency.
Created Snowflake-based dashboards in Tableau and QuickSight, delivering real-time insights into network traffic and IoT sensor anomalies for better decision-making.
Built and optimized ETL pipelines using Apache Spark and Hadoop HDFS, ensuring efficient data collection, transformation, and storage.
Designed real-time monitoring systems using Apache Kafka for network traffic analysis, anomaly detection, and performance tracking.
Assisted in building an automated system for predictive maintenance using IoT data and machine learning.
Developed cloud-based architecture using AWS S3, EC2, and QuickSight for scalable storage, data processing, and real-time reporting.
Deployed containerized applications using Docker and enhanced data security through encryption, access controls, and secure data pipeline design.
Implemented encryption and access control mechanisms to protect sensitive network data, ensuring compliance with security best practices.
Improved SQL queries and interactive Tableau dashboards, enhanced data retrieval speeds and providing actionable insights for business decisions.
Conducted performance optimization for big data processing pipelines, reducing processing time by 40% and improving overall system efficiency.
Education:
Wright State University, Dayton, United States - Masters, Computer Science
Certifications:
AWS Certified Data Engineer - Associate
AWS Certified Developer - Associate