Rohith Reddy N
Santa Clara, CA, USA +1-352-***-**** *******************@*****.*** linkedin.com/in/rohithreddyn SUMMARY
Senior Data Analytics Engineer with 5+ years in building & optimizing scalable data pipelines and analytics solutions across AWS, Azure & Snowflake platforms. Skilled in developing robust ETL/ELT workflows using Spark, Databricks, Airflow and dbt to support real-time & batch processing. Proficient in Python & SQL with hands-on expertise in data modeling, pipeline orchestration & tuning performance. Proven track record of enabling data-driven decision-making through advanced analytics, dashboarding and collaboration with cross-functional teams. Passionate about integrating data engineering and analytics to drive operational impact and business value. TECHNICAL SKILLS
• Data Engineering: Big Data, ETL/ELT, Data Lake, Medallion Architecture, Delta Lake, Apache Spark, PySpark, Apache Flink, Kafka, dbt, Airflow, Snowflake Streams& Tasks, Change Data Capture, Modular & Dynamic Pipelines, Metadata-driven Frameworks, Batch & Stream Processing, Schema Evolution
• Cloud & Devops: AWS (S3, Redshift, Athena, Lambda, Glue, EMR, CloudWatch), Azure (Data Factory, Databricks, Data Lake Storage, Synapse, Monitor), Snowflake, Docker, Kubernetes, Terraform, Git, CI/CD
• Programming & Scripting: Python (Pandas, NumPy, Matplotlib, Scikit-learn, TensorFlow, XGBoost, LIGHTGBM), SQL, Shell Scripting
• Data Quality& Testing: Great Expectations, PyTest, dbt tests, Data Validation Frameworks, Anomaly Detection Pipelines
• Data Warehouse & Databases: Snowflake, Amazon Redshift, Azure Synapse, PostgreSQL, Oracle, SQL Server, CosmosDB, MongoDB
• Data Visualization: Tableau, Power BI, Looker, Plotly, Seaborn, KPI Reporting, Self-Service Analytics
• Analytics& ML Engineering: Feature Engineering, Exploratory Data Analysis (EDA), A/B Testing, Time Series Forecasting, Predictive Modeling, Classification & Regression, Model Evaluation (ROC, AUC, Precision/Recall, Confusion Matrix), ML Pipelines
(MLflow, SageMaker), Model Monitoring
• Data Modeling & Performance: Star /Snowflake Schema Design, Slowly Changing Dimensions (Types1& 2), Indexing, Schema Evolution, Materialized Views, Z-ordering, Predicate Pushdown, Clustering Keys, Query Tuning, Performance Optimization
• Compliance & Standards: Compliance (AML, GDPR, PCI-DSS, SOX) Healthcare Data Standards (HIPAA, HL7, EDI, HIE) PROFESSIONAL EXPERIENCE
Accenture Sr. Data Analytics Engineer Feb 2024 - Present
• Developed and maintained scalable ELT pipelines using AWS Glue, Lambda, Step Functions, and Apache Spark on EMR, improving data processing efficiency and reducing execution time by 40%.
• Built real-time ingestion pipelines integrating Apache Kafka, Snowpipe & S3, enabling real-time data availability for reporting & analytics in Snowflake.
• Engineered stream and batch data flows following Lambda Architecture and Medallion Architecture (Bronze/Silver/Gold) to support structured, scalable analytics delivery and ML data preparation.
• Integrated Apache Iceberg on EMR to support schema evolution & ACID compliance, ensuring data governance across dynamic datasets.
• Orchestrated complex workflows using Apache Airflow, coordinating tasks across dbt, Snowflake, and AWS Glue for modular transformation and automated lineage tracking.
• Optimized Airflow DAG performance by refactoring task dependencies and leveraging dynamic task mapping, reducing pipeline execution time by 30%.
• Built and maintained modular, reusable Airflow DAGs to support dynamic scheduling and dependency management for automated pipeline execution.
• Built and maintained complex dbt models to automate data transformation processes, ensuring high data quality and reducing manual intervention by 50%.
• Designed feature engineering pipelines using dbt and Python (Pandas), powering downstream churn prediction and time series forecasting models.
• Managed critical Snowflake objects such as streams, tasks, stored procedures, and secure views to enable incremental processing, automation, and fine-grained data access control.
• Tuned Snowflake workloads using clustering keys, materialized views, and query caching, improving dashboard responsiveness and supporting concurrent analytics users.
• Developed and optimized Snowflake data marts for various departments (marketing, finance), enabling self-service analytics and reducing time-to-insight.
• Built interactive dashboards with Tableau, enabling teams to monitor KPIs, campaign results & model predictions in near real-time.
• Conducted exploratory data analysis (EDA) and collaborated with cross-functional stakeholders to deliver insightful, actionable reports that supported strategic decisions.
• Implemented data validation frameworks using dbt tests and Great Expectations to ensure consistency across development, QA, and production environments.
Sherwin Williams Data Engineer Dec 2022 - Jan 2024
• Collaborated closely with data analysts and data scientists to refine and debug data pipelines, enhancing data quality and usability for downstream analytics and reporting.
• Architected and implemented enterprise-scale Medallion Architecture in Azure Data Lake Storage, optimizing data governance and supporting structured reporting processes.
• Enhanced data storage efficiency in the Gold layer by leveraging Parquet file format, achieving 40% better compression and 25% improved query performance.
• Developed and maintained high-performance data pipelines in Azure Databricks using PySpark, ensuring 99.99% data accuracy and strict regulatory compliance, with detailed documentation for workflows.
• Integrated Snowflake with Databricks to improve data access & collaboration, resulting in a 40% boost in cross-functional productivity.
• Constructed Snowflake data marts for operations and marketing, facilitating unified reporting and comprehensive customer insights.
• Implemented Snowflake features such as time travel, zero-copy cloning, and fail-safe to support efficient backup, recovery, and sandbox testing for new reports and pipelines.
• Optimized Snowflake query performance by fine-tuning clustering keys & leveraging result caching, improving query efficiency.
• Designed and automated both batch and streaming data pipelines using Data Factory and Apache Kafka, reducing ingestion latency to sub-minute intervals for near real-time analytics.
• Refined Delta Lake optimization in Azure Databricks by applying advanced merge, optimize, and vacuum strategies, cutting storage costs by 50% and improving query performance.
• Implemented robust table formats with Delta Lake to enable ACID compliance and schema evolution, resulting in improved data integrity and flexibility in data management.
• Applied advanced data partitioning and indexing techniques, including predicate pushdown and Z-ordering, to reduce query execution times and improve overall performance.
• Developed resilient pipelines with schema inference and dynamic mapping, ensuring 99.9% uptime during frequent schema changes.
• Constructed comprehensive CDC pipelines with ADF and Delta Lake to replicate changes in near real-time to Snowflake.
• Integrated Airflow with Slack & Azure Monitor for real-time alerting, enhancing proactive monitoring & troubleshooting.
• Leveraged Azure CosmosDB for building scalable NoSQL data stores, ensuring high throughput and system availability.
• Established an advanced operational monitoring framework with Azure Monitor and Log Analytics, reducing system downtime by 30% and improving overall data pipeline reliability.
• Developed parameterized Databricks notebooks for reusable data transformation routines, contributing to streamlined workflow automation.
Perficient Inc Data Engineer Apr 2019 - Nov 2021
• Designed and implemented ETL pipelines in Data Factory (ADF) and Databricks, automating data ingestion, transformation, and processing for analytics use cases.
• Orchestrated data workflows using Data Factory, Apache Kafka and Airflow, ensuring efficient scheduling, monitoring, and dependency management for batch and incremental data processing.
• Developed PySpark-based data pipelines to process large-scale datasets, optimizing query performance and reducing data latency.
• Utilized Delta Lake as a storage format to enable version control, time travel, and simplified upserts in ETL pipelines.
• Built and maintained data models using Snowflake, applying star schema design for optimized query performance and implementing Slowly Changing Dimensions (Type 1 & 2) to manage historical data.
• Created parameterized ETL pipelines and metadata-driven frameworks in ADF using Lookup Activity and Stored Procedures, enabling flexible and reusable data workflows.
• Utilized ADF Dataflows for dynamic schema mapping, automating data transformations across multiple datasets without manual intervention.
• Developed unit tests for ETL pipelines using PyTest and Great Expectations, ensuring data quality and reliability.
• Created materialized and non-materialized views in Azure Synapse Analytics, improving performance for business intelligence and visualization tools.
• Implemented data partitioning & indexing strategies in Synapse Analytics, improving query execution speed by 30%.
• Conducted data validation & integrity checks using dbt, SQL-based test cases, ensuring compliance & business rules.
• Assisted in the migration of legacy ETL processes from on-prem SSIS to Azure Data Factory, modernizing workflows.
• Automated data quality checks and anomaly detection using Python and integrated them with monitoring tools like Azure Monitor.
• Developed Kafka-based ingestion pipelines for capturing high-velocity data streams, supporting near real-time data processing and reducing overall data latency across reporting systems.
• Conducted exploratory data analysis (EDA) on ADLS, helping identify trends and patterns for business decision-making.
• Created and optimized Tableau dashboards for tracking KPIs, enhancing decision-making efficiency for sales and marketing teams.
• Performed A/B testing and statistical analysis using Python (SciPy, Statsmodels) to evaluate business strategies and optimize marketing campaigns.
• Collaborated with cross-functional teams, including data scientists and analysts, to integrate predictive models into analytics pipelines. CERTIFICATIONS
• AWS Solutions Architect Associate, DP-203(Azure Data Engineer Associate)