Rohith Reddy N
Sunnyvale, CA, USA +1-352-***-**** *******************@*****.*** LinkedIn
SUMMARY
Senior Data Engineer with 5+ years of experience architecting and optimizing scalable data platforms on Azure and other cloud platforms. Expertise in designing efficient Python-driven ETL workflows and data pipelines using Azure Data Factory and Databricks alongside Azure Storage and Virtual Machines. Adept in implementing robust data governance frameworks, ensuring data quality and regulatory compliance while configuring effective monitoring for batch processes. TECHNICAL SKILLS
• Data Engineering: Big Data, ETL/ELT, Data Lake, Medallion Architecture, Delta Lake, Apache Spark, PySpark, Apache Flink, Kafka, dbt(Models, Tests, macros), Airflow, Snowflake Streams& Tasks, Change Data Capture, Modular & Dynamic Pipelines, Metadata-driven Frameworks, Batch & Stream Processing, Schema Evolution, Solutions Architect
• Cloud Platforms: AWS (S3, Redshift, Athena, Lambda, Glue, EMR, IAM, EMR), Azure (Data Factory, Databricks, ADLS, Synapse, Key Vault, Monitor), Snowflake (Streams, Tasks, Snowpipe, Time Travel, Zero-Copy Cloning), Azure VM
• DevOps & Infra: Docker, Kubernetes, CI/CD, Git, Github Actions, AWS CloudWatch, Azure Monitor, Terraform
• Programming, ML & Scripting: Python (Pandas, Matplotlib, Scikit-learn, XGBoost), SQL, Bash/Shell scripting, Python ETL
• Analytics & ML Engineering: Feature Engineering, Exploratory Data Analysis (EDA), A/B Testing, Time Series Forecasting, Predictive Modeling, Classification & Regression, Model Evaluation (ROC, AUC, Precision/Recall, Confusion Matrix), MLflow, Model Monitoring, SageMaker
• Data Modeling & Performance: Star /Snowflake Schema, Slowly Changing Dimensions (Type 1& 2), Indexing, Schema Evolution, Materialized Views, Z-ordering, Clustering Keys, Query Optimization, Partitioning
• Data Quality: Great Expectations, PyTest, dbt tests, Data Validation Frameworks, Anomaly Detection Pipelines, Data Governance
• Data Warehouse & Databases: Snowflake, Redshift, Azure Synapse, PostgreSQL, SQL Server, CosmosDB, MongoDB, Oracle
• Data Visualization & BI Tools: Tableau, Power BI, DAX, Plotly, Seaborn, KPI Reporting
• Compliance & Standards: HIPAA, HL7, EDI (Healthcare), GDPR, SOX, AML (Financial), PCI-DSS (Retail), Banking Industry PROFESSIONAL EXPERIENCE
Accenture Feb 2024 - Present
Sr. Data Analytics Engineer
• Developed and maintained scalable ELT pipelines using AWS Glue, Lambda, Step Functions, and Apache Spark on EMR, improving data processing efficiency and reducing execution time by 40%.
• Built real-time ingestion pipelines integrating Apache Kafka, Snowpipe & S3, enabling real-time data availability for reporting & analytics in Snowflake.
• Implemented stream processing jobs using Apache Flink for real-time data transformations, enabling sub-second latency analytics.
• Engineered stream and batch data flows following Lambda Architecture and Medallion Architecture to support structured, scalable analytics delivery and ML data preparation.
• Integrated Apache Iceberg on EMR to support schema evolution & ACID compliance, ensuring data governance across datasets.
• Orchestrated complex workflows using Apache Airflow, coordinating tasks across dbt, Snowflake, and AWS Glue for modular transformation and automated lineage tracking.
• Optimized Airflow DAG performance by refactoring task dependencies and leveraging dynamic task mapping, reducing pipeline execution time by 30%.
• Developed modular, reusable Airflow DAGs to support dynamic scheduling & dependency management for automated pipeline execution.
• Implemented a metadata-driven framework to dynamically generate ELT pipelines in AWS Glue and dbt, enabling scalable onboarding of new data sources with minimal code changes and improving development velocity by 60%.
• Built and maintained complex dbt models to automate data transformation processes, ensuring high data quality and reducing manual intervention by 50%.
• Built feature engineering pipelines using dbt &Python, powering downstream churn prediction &time series forecasting models.
• Managed critical Snowflake objects such as streams, tasks, stored procedures, and secure views to enable incremental processing, automation, and fine-grained data access control.
• Tuned Snowflake workloads using clustering keys, materialized views, and query caching, improving dashboard responsiveness and supporting concurrent analytics users.
• Developed and optimized Snowflake data marts for various departments (marketing, finance), enabling self-service analytics and reducing time-to-insight.
• Built Tableau dashboards using live and extract connections to Snowflake for real-time and cost-efficient KPI and model monitoring.
• Conducted exploratory data analysis (EDA) and collaborated with cross-functional stakeholders to deliver insightful, actionable reports that supported strategic decisions.
• Implemented data validation frameworks using dbt tests and Great Expectations to ensure consistency across development, QA, and production environments.
Sherwin Williams Dec 2022 - Jan 2024
Data Engineer
• Designed and automated both batch and streaming data pipelines using Data Factory and Apache Kafka, reducing ingestion latency to sub-minute intervals for near real-time analytics.
• Developed and maintained high-performance data pipelines in Azure Databricks using PySpark, ensuring 99.99% data accuracy and strict regulatory compliance, with detailed documentation for workflows.
• Architected and implemented enterprise-scale Medallion Architecture in Azure Data Lake Storage, optimizing data governance and supporting structured reporting processes.
• Refined Delta Lake optimization in Azure Databricks by applying advanced merge, optimize, and vacuum strategies, cutting storage costs by 50% and improving query performance.
• Implemented robust table formats with Delta Lake to enable ACID compliance and schema evolution, resulting in improved data integrity and flexibility in data management.
• Enhanced data storage efficiency in the Gold layer by leveraging Parquet file format, achieving 40% better compression and 25% improved query performance.
• Integrated Snowflake with Databricks to improve data access & collaboration, resulting in a 40% boost in cross-functional productivity.
• Constructed Snowflake data marts for operations and marketing, facilitating unified reporting and comprehensive customer insights.
• Implemented Snowflake features such as time travel, zero-copy cloning, and fail-safe to support efficient backup, recovery, and sandbox testing for new reports and pipelines.
• Optimized Snowflake query performance by fine-tuning clustering keys & leveraging result caching, improving query efficiency.
• Applied advanced data partitioning and indexing techniques, including predicate pushdown and Z-ordering, to reduce query execution times and improve overall performance.
• Developed resilient pipelines with schema inference and dynamic mapping, ensuring 99.9% uptime during frequent schema changes.
• Constructed comprehensive CDC pipelines with ADF and Delta Lake to replicate changes in near real-time to Snowflake.
• Integrated Airflow with Slack & Azure Monitor for real-time alerting, enhancing proactive monitoring & troubleshooting.
• Leveraged Azure CosmosDB for building scalable NoSQL data stores, ensuring high throughput and system availability.
• Established an advanced operational monitoring framework with Azure Monitor and Log Analytics, reducing system downtime by 30% and improving overall data pipeline reliability.
• Developed parameterized Databricks notebooks for reusable data transformation routines, contributing to streamlined workflow automation.
Perficient Inc Apr 2019 - Nov 2021
Data Engineer
• Designed and implemented ETL pipelines in Data Factory (ADF) and Databricks, automating data ingestion, transformation, and processing for analytics use cases.
• Orchestrated workflows using Data Factory, Apache Kafka and Airflow, ensuring efficient scheduling, monitoring and dependency management for batch and incremental data processing, leading to improved data workflow reliability and reduced processing times
• Developed PySpark-based data pipelines to process large-scale datasets, optimizing query performance and reducing data latency.
• Utilized Delta Lake as a storage format to enable version control, time travel, and simplified upserts in ETL pipelines.
• Built and maintained data models using Snowflake, applying star schema design for optimized query performance and implementing Slowly Changing Dimensions (Type 1 & 2) to manage historical data, which improved data retrieval speed and accuracy
• Created parameterized ETL pipelines and metadata-driven frameworks in ADF using Lookup Activity and Stored Procedures, enabling flexible and reusable data workflows.
• Utilized ADF Dataflows for dynamic schema mapping, automating data transformations across multiple datasets without manual intervention.
• Developed unit tests for ETL pipelines using PyTest and Great Expectations, ensuring data quality and reliability.
• Created materialized and non-materialized views in Azure Synapse Analytics, improving performance for business intelligence and visualization tools.
• Implemented data partitioning & indexing strategies in Synapse Analytics, improving query execution speed by 30%.
• Conducted data validation & integrity checks using dbt, SQL-based test cases, ensuring compliance & business rules.
• Assisted in the migration of legacy ETL processes from on-prem SSIS to Azure Data Factory, modernizing workflows.
• Automated data quality checks and anomaly detection using Python and integrated them with monitoring tools like Azure Monitor.
• Developed Kafka-based ingestion pipelines for capturing high-velocity data streams, supporting near real-time data processing and reducing overall data latency across reporting systems.
• Conducted exploratory data analysis (EDA) on ADLS, helping identify trends and patterns for business decision-making.
• Created and optimized Tableau dashboards for tracking KPIs, enhancing decision-making efficiency for sales and marketing teams.
• Performed A/B testing and statistical analysis using Python (SciPy, Statsmodels) to evaluate business strategies and optimize marketing campaigns.
• Integrated predictive models developed by data scientists into production-grade analytics pipelines, enhancing the accuracy and efficiency of data-driven decisions
CERTIFICATIONS
• AWS Certified Solutions Architect – Associate, Microsoft Certified: Azure Data Engineer Associate