Senior Data Analytics Engineer

Location:

Sunnyvale, CA

Posted:

May 24, 2025

Contact this candidate

Resume:

Rohith Reddy N

Santa Clara, CA, USA +1-352-***-**** *******************@*****.*** linkedin.com/in/rohithreddyn SUMMARY

Senior Data Analytics Engineer with 5+ years in building & optimizing scalable data pipelines and analytics solutions across AWS, Azure & Snowflake platforms. Skilled in developing robust ETL/ELT workflows using Spark, Databricks, Airflow and dbt to support real-time & batch processing. Proficient in Python & SQL with hands-on expertise in data modeling, pipeline orchestration & tuning performance. Proven track record of enabling data-driven decision-making through advanced analytics, dashboarding and collaboration with cross-functional teams. Passionate about integrating data engineering and analytics to drive operational impact and business value. TECHNICAL SKILLS

• Data Engineering: Big Data, ETL/ELT, Data Lake, Medallion Architecture, Delta Lake, Apache Spark, PySpark, Apache Flink, Kafka, dbt, Airflow, Snowflake Streams& Tasks, Change Data Capture, Modular & Dynamic Pipelines, Metadata-driven Frameworks, Batch & Stream Processing, Schema Evolution

• Cloud & Devops: AWS (S3, Redshift, Athena, Lambda, Glue, EMR, CloudWatch), Azure (Data Factory, Databricks, Data Lake Storage, Synapse, Monitor), Snowflake, Docker, Kubernetes, Terraform, Git, CI/CD

• Programming & Scripting: Python (Pandas, NumPy, Matplotlib, Scikit-learn, TensorFlow, XGBoost, LIGHTGBM), SQL, Shell Scripting

• Data Quality& Testing: Great Expectations, PyTest, dbt tests, Data Validation Frameworks, Anomaly Detection Pipelines

• Data Warehouse & Databases: Snowflake, Amazon Redshift, Azure Synapse, PostgreSQL, Oracle, SQL Server, CosmosDB, MongoDB

• Data Visualization: Tableau, Power BI, Looker, Plotly, Seaborn, KPI Reporting, Self-Service Analytics

• Analytics& ML Engineering: Feature Engineering, Exploratory Data Analysis (EDA), A/B Testing, Time Series Forecasting, Predictive Modeling, Classification & Regression, Model Evaluation (ROC, AUC, Precision/Recall, Confusion Matrix), ML Pipelines

(MLflow, SageMaker), Model Monitoring

• Data Modeling & Performance: Star /Snowflake Schema Design, Slowly Changing Dimensions (Types1& 2), Indexing, Schema Evolution, Materialized Views, Z-ordering, Predicate Pushdown, Clustering Keys, Query Tuning, Performance Optimization

• Compliance & Standards: Compliance (AML, GDPR, PCI-DSS, SOX) Healthcare Data Standards (HIPAA, HL7, EDI, HIE) PROFESSIONAL EXPERIENCE

Accenture Sr. Data Analytics Engineer Feb 2024 - Present

• Developed and maintained scalable ELT pipelines using AWS Glue, Lambda, Step Functions, and Apache Spark on EMR, improving data processing efficiency and reducing execution time by 40%.

• Built real-time ingestion pipelines integrating Apache Kafka, Snowpipe & S3, enabling real-time data availability for reporting & analytics in Snowflake.

• Engineered stream and batch data flows following Lambda Architecture and Medallion Architecture (Bronze/Silver/Gold) to support structured, scalable analytics delivery and ML data preparation.

• Integrated Apache Iceberg on EMR to support schema evolution & ACID compliance, ensuring data governance across dynamic datasets.

• Orchestrated complex workflows using Apache Airflow, coordinating tasks across dbt, Snowflake, and AWS Glue for modular transformation and automated lineage tracking.

• Optimized Airflow DAG performance by refactoring task dependencies and leveraging dynamic task mapping, reducing pipeline execution time by 30%.

• Built and maintained modular, reusable Airflow DAGs to support dynamic scheduling and dependency management for automated pipeline execution.

• Built and maintained complex dbt models to automate data transformation processes, ensuring high data quality and reducing manual intervention by 50%.

• Designed feature engineering pipelines using dbt and Python (Pandas), powering downstream churn prediction and time series forecasting models.

• Managed critical Snowflake objects such as streams, tasks, stored procedures, and secure views to enable incremental processing, automation, and fine-grained data access control.

• Tuned Snowflake workloads using clustering keys, materialized views, and query caching, improving dashboard responsiveness and supporting concurrent analytics users.

• Developed and optimized Snowflake data marts for various departments (marketing, finance), enabling self-service analytics and reducing time-to-insight.

• Built interactive dashboards with Tableau, enabling teams to monitor KPIs, campaign results & model predictions in near real-time.

• Conducted exploratory data analysis (EDA) and collaborated with cross-functional stakeholders to deliver insightful, actionable reports that supported strategic decisions.

• Implemented data validation frameworks using dbt tests and Great Expectations to ensure consistency across development, QA, and production environments.

Sherwin Williams Data Engineer Dec 2022 - Jan 2024

• Collaborated closely with data analysts and data scientists to refine and debug data pipelines, enhancing data quality and usability for downstream analytics and reporting.

• Architected and implemented enterprise-scale Medallion Architecture in Azure Data Lake Storage, optimizing data governance and supporting structured reporting processes.

• Enhanced data storage efficiency in the Gold layer by leveraging Parquet file format, achieving 40% better compression and 25% improved query performance.

• Developed and maintained high-performance data pipelines in Azure Databricks using PySpark, ensuring 99.99% data accuracy and strict regulatory compliance, with detailed documentation for workflows.

• Integrated Snowflake with Databricks to improve data access & collaboration, resulting in a 40% boost in cross-functional productivity.

• Constructed Snowflake data marts for operations and marketing, facilitating unified reporting and comprehensive customer insights.

• Implemented Snowflake features such as time travel, zero-copy cloning, and fail-safe to support efficient backup, recovery, and sandbox testing for new reports and pipelines.

• Optimized Snowflake query performance by fine-tuning clustering keys & leveraging result caching, improving query efficiency.

• Designed and automated both batch and streaming data pipelines using Data Factory and Apache Kafka, reducing ingestion latency to sub-minute intervals for near real-time analytics.

• Refined Delta Lake optimization in Azure Databricks by applying advanced merge, optimize, and vacuum strategies, cutting storage costs by 50% and improving query performance.

• Implemented robust table formats with Delta Lake to enable ACID compliance and schema evolution, resulting in improved data integrity and flexibility in data management.

• Applied advanced data partitioning and indexing techniques, including predicate pushdown and Z-ordering, to reduce query execution times and improve overall performance.

• Developed resilient pipelines with schema inference and dynamic mapping, ensuring 99.9% uptime during frequent schema changes.

• Constructed comprehensive CDC pipelines with ADF and Delta Lake to replicate changes in near real-time to Snowflake.

• Integrated Airflow with Slack & Azure Monitor for real-time alerting, enhancing proactive monitoring & troubleshooting.

• Leveraged Azure CosmosDB for building scalable NoSQL data stores, ensuring high throughput and system availability.

• Established an advanced operational monitoring framework with Azure Monitor and Log Analytics, reducing system downtime by 30% and improving overall data pipeline reliability.

• Developed parameterized Databricks notebooks for reusable data transformation routines, contributing to streamlined workflow automation.

Perficient Inc Data Engineer Apr 2019 - Nov 2021

• Designed and implemented ETL pipelines in Data Factory (ADF) and Databricks, automating data ingestion, transformation, and processing for analytics use cases.

• Orchestrated data workflows using Data Factory, Apache Kafka and Airflow, ensuring efficient scheduling, monitoring, and dependency management for batch and incremental data processing.

• Developed PySpark-based data pipelines to process large-scale datasets, optimizing query performance and reducing data latency.

• Utilized Delta Lake as a storage format to enable version control, time travel, and simplified upserts in ETL pipelines.

• Built and maintained data models using Snowflake, applying star schema design for optimized query performance and implementing Slowly Changing Dimensions (Type 1 & 2) to manage historical data.

• Created parameterized ETL pipelines and metadata-driven frameworks in ADF using Lookup Activity and Stored Procedures, enabling flexible and reusable data workflows.

• Utilized ADF Dataflows for dynamic schema mapping, automating data transformations across multiple datasets without manual intervention.

• Developed unit tests for ETL pipelines using PyTest and Great Expectations, ensuring data quality and reliability.

• Created materialized and non-materialized views in Azure Synapse Analytics, improving performance for business intelligence and visualization tools.

• Implemented data partitioning & indexing strategies in Synapse Analytics, improving query execution speed by 30%.

• Conducted data validation & integrity checks using dbt, SQL-based test cases, ensuring compliance & business rules.

• Assisted in the migration of legacy ETL processes from on-prem SSIS to Azure Data Factory, modernizing workflows.

• Automated data quality checks and anomaly detection using Python and integrated them with monitoring tools like Azure Monitor.

• Developed Kafka-based ingestion pipelines for capturing high-velocity data streams, supporting near real-time data processing and reducing overall data latency across reporting systems.

• Conducted exploratory data analysis (EDA) on ADLS, helping identify trends and patterns for business decision-making.

• Created and optimized Tableau dashboards for tracking KPIs, enhancing decision-making efficiency for sales and marketing teams.

• Performed A/B testing and statistical analysis using Python (SciPy, Statsmodels) to evaluate business strategies and optimize marketing campaigns.

• Collaborated with cross-functional teams, including data scientists and analysts, to integrate predictive models into analytics pipelines. CERTIFICATIONS

• AWS Solutions Architect Associate, DP-203(Azure Data Engineer Associate)

Contact this candidate