ROHIT RAJU
+1-972-***-**** *****.**@***********.*** LinkedIn
SUMMARY
Data professional with 6+ years of experience in cloud ETL, data engineering, and business analytics, delivering end- to-end solutions across enterprise, finance, and marketing domains.
Designed and implemented ETL pipelines, data lakes, and API integrations using Azure Data Factory, Logic Apps, AWS Glue, Snowflake, and Synapse Analytics, cutting data processing time by up to 70%.
Conducted data analysis, mining, and feature engineering using SQL, Python, PySpark, and R, supporting fraud detection, claims analytics, and NLP projects with 25–60% improved accuracy.
Built dashboards and reports in Power BI, Tableau, QuickSight, and Excel, providing actionable insights for risk assessment, campaign optimization, and executive decision-making.
Collaborated with cross-functional teams on requirements, business processes, and project delivery, implementing data validation, metadata orchestration, and governance frameworks to improve efficiency and reliability. EXPERIENCE
SUBARU OF AMERICA May 2024 – Current
Cloud Data Engineer
Engineered ETL pipelines and API integrations to migrate enterprise datasets from Oracle Cloud Infrastructure
(OCI) to Azure within MS Dynamics 365/CRM, cutting processing time by 70%.
Designed scalable ETL architectures in Azure Data Factory and Logic Apps, leading whiteboard sessions with analysts and stakeholders to align workflows; streamlined orchestration and cut manual integration effort by approximately 50%.
Implemented a metadata-driven orchestration layer that dynamically configured pipelines, enabling onboarding of new sources into Azure Data Lake Storage (ADLS) with minimal developer intervention.
Restructured partitioning strategy in Azure Synapse and tuned indexing in Azure SQL, reducing average query runtime from 9 minutes to under 3 minutes across analytics workloads.
Built curated datasets and interactive dashboards in Power BI, leveraging SQL and Python preprocessing scripts; improved business reporting cycles by providing self-service access to cleansed data.
Integrated an automated monitoring framework with Azure Functions, event-driven alerts, and logging systems; reduced ETL downtime incidents by 60% through proactive anomaly detection. AIG Aug 2023 - Apr 2024
ETL Developer
Designed and deployed ETL pipelines in Azure Data Factory and Logic Apps to process policyholder and claims data, ensuring compliance with reporting standards and reducing manual reconciliation efforts by 35%.
Built automated validation checks for financial and claims datasets, using dynamic workflows in ADF and Python scripts; improved data accuracy and eliminated recurring discrepancies in actuarial and risk reporting.
Developed interactive dashboards in Power BI and Tableau to visualize claims trends, premium revenue, and loss ratios, enabling senior underwriters and actuaries to identify high-risk segments more effectively.
Consolidated historical claims and premium data using Excel Power Query and SQL-based transformations, supporting actuarial forecasting and pricing models with a unified, cleansed dataset.
Designed A/B testing frameworks using Azure Functions and Python to evaluate campaign effectiveness for customer retention initiatives; insights drove a 15% lift in policy renewals. AMAZON Mar 2021 - Jun 2022
Data Engineer
Automated fraud detection ETL workflows using AWS Glue, Lambda, and Step Functions, reducing manual intervention by 40% and ensuring consistent daily reconciliation across millions of transactions.
Designed feature engineering pipelines in Python and SQL (Window Functions, CTEs) to enrich training datasets, which improved model detection accuracy by 25% and cut false positives significantly.
Deployed fraud scoring models into production using SageMaker and integrated outputs with QuickSight dashboards, giving risk teams real-time visibility into suspicious transaction patterns.
Built reusable data preprocessing modules in PySpark and SQL for ingestion into Amazon S3 and Redshift, reducing dataset preparation time from hours to minutes.
Partnered with compliance and risk stakeholders to design audit-ready data trails; implemented version-controlled transformations in GitHub for model logic and data lineage tracking.
Established automated performance monitoring using CloudWatch and custom Python scripts, enabling early anomaly detection and reducing fraud model downtime incidents by 30%. BUILD MY BRAND Aug 2019 - Feb 2021
Cloud Data Architect Lead
Architected and delivered end-to-end AWS and Azure migration projects for 5 client portfolios, optimizing ETL workflows with Azure Data Factory and AWS Glue; improved data processing efficiency by 35%.
Designed and deployed enterprise data warehouses on Snowflake and SQL Server, integrating multi-channel customer data into a unified repository that supported downstream analytics and campaign ROI tracking.
Implemented scalable ingestion frameworks with REST API and third-party SaaS integrations, enabling near real- time customer engagement insights for digital marketing teams.
Optimized query performance and storage by tuning partitioning strategies and indexing within Azure Synapse and Snowflake, reducing report generation times from 20 minutes to under 5.
Built executive-facing dashboards in Power BI and Tableau to track campaign KPIs, ad spend efficiency, and client growth metrics, leading to more informed budget allocation decisions.
Established a data quality and governance framework with automated validation rules, anomaly alerts, and lineage tracking; reduced reconciliation issues across reporting layers by 40%.
Mentored a hybrid team of 7 engineers across onshore/offshore delivery, setting coding standards with GitLab CI/CD pipelines and agile sprint practices.
AMAZON May 2018 – Jul 2019
ML Data Analyst
Contributed to the Alexa Data Services pilot team’s success, designing data annotation and validation workflows, scaling the program from 40 to 250+ associates in 15 months and improving throughput by 60%.
Built feature engineering pipelines with SQL (Joins, CTEs, Window Functions) and Python, supporting anomaly detection models that saved 160+ human-hours per day across labeling operations.
Collaborated with ML engineers to deploy scalable training data pipelines on AWS S3 and Lambda, reducing manual preparation time by 30% and accelerating model deployment cycles.
Automated metadata tagging and preprocessing tasks using Python scripts, ensuring consistent input quality for NLP models and reducing error rates in training datasets.
Conducted root cause analysis on misclassified predictions, feeding insights into iterative model improvements that boosted classification accuracy by 20%.
Created monitoring scripts in Jupyter Notebook for tracking annotation consistency and model performance, providing data-driven feedback loops to improve labeling guidelines. AMAZON Sep 2016 – Jan 2017
Data Associate
Analyzed large volumes of customer feedback data using Excel (Pivot Tables, VLOOKUP) and SQL queries to identify recurring delivery issues, driving actionable insights for the logistics team.
Created ad-hoc dashboards in Tableau to visualize delivery delays, order accuracy, and service metrics; insights helped regional managers prioritize operational bottlenecks.
Partnered with cross-functional stakeholders to translate feedback patterns into process improvements, contributing to a measurable increase in on-time delivery and customer satisfaction scores.
Conducted weekly reporting on customer sentiment trends, highlighting root causes of complaints and enabling data- backed adjustments in last-mile delivery strategies. SKILLS
Cloud & Data Engineering: AWS, Azure, GCP, Oracle Cloud Infrastructure (OCI), Databricks, Snowflake, Synapse Analytics, Event Hubs, Kinesis, Data Lakes
Data Warehousing & ETL: Data Warehousing, Data Integration, ETL/ELT, SSIS, DataStage, Informatica, Dimensional Modeling, Star Schema, Data Vault, Data Migration, Data Modeling, Metadata Management, Data Lineage, Data Quality Programming & Analytics: SQL, Python, PySpark, R, JavaScript, NumPy, Pandas, VBA, Excel, Regression Analysis, Time Series Analysis, Statistical Modeling, A/B Testing, Machine Learning, Model Deployment, Model Evaluation Databases & Storage: SQL Server, MySQL, PostgreSQL, Cosmos DB, DynamoDB, Delta Lake, Azure Data Lake, Amazon S3, Data Backup, Archiving, Replication, Encryption, Data Masking Visualization & Reporting: Power BI, Tableau, Amazon QuickSight, MS Excel (Power Query), Canva, Dashboard Design, Reporting Automation, Data Storytelling, Business Intelligence (BI), Data Presentation, Prompt Engineering DevOps & Automation: Git, GitHub, GitLab, Azure DevOps, CI/CD, Terraform, Docker, Agile & Waterfall, Data Governance & Security: Data Validation, Metadata Orchestration, Data Governance, Compliance, Risk Assessment, Audit Trails, Access Control, Data Security, Data Encryption, Data Masking EDUCATION
East Texas A&M University Dec 2023
Master of Science in Business Analytics Dallas, TX Osmania University Apr 2018
Bachelor of Business Administration Hyderabad, India