SRIRAAGA SAKKIROLLA
Irving TX–***** (***) ***–***1 ********.**@*****.*** LinkedIn GitHub
SUMMARY
Data Engineer with 4+ years of experience building ETL pipelines and data warehouse solutions for analytics and reporting. Strong in SQL, Python, and Spark, with hands-on experience working with large datasets across AWS, Azure, and GCP. Experienced in data modeling, data validation, and delivering reliable datasets for business intelligence use cases. PROFESSIONAL EXPERIENCE
Data Engineer – AI / ML Stanley Black & Decker TX, USA February 2024 – Present
● Optimized scalable ETL pipelines using AWS (S3, Redshift, Glue, EMR) and GCP (BigQuery) to process 5M–10M+ records per batch across 15+ heterogeneous data sources, supporting daily and near real-time ingestion.
● Built data warehouse solutions, enabling efficient ingestion and querying of 2TB+ structured data with support for 25+ reporting datasets.
● Maintained metadata management frameworks covering multiple datasets and data attributes, improving lineage tracking and governance across pipelines.
● Transformed raw datasets into structured, analytics-ready formats using SQL and PySpark, supporting 20+ downstream reporting and ML use cases.
● Developed data models (star and snowflake schemas) to support OLAP-based reporting and analytics.
● Implemented comprehensive data validation frameworks including null checks, deduplication, and reconciliation across 50+ pipeline stages, improving data reliability.
● Orchestrated workflows using Airflow, managing 40+ DAGs with interdependent tasks, automating scheduling and monitoring of production pipelines.
● Leveraged AWS Glue and EMR for distributed data processing across multi-node clusters handling 100M+ records/day, optimizing large-scale transformations.
● Collaborated with 5+ cross-functional teams to gather requirements and deliver 25+ curated datasets powering dashboards and analytics.
● Supported junior team members by reviewing SQL queries and sharing best practices for ETL pipeline development.
● Documented datasets, schemas, and lineage for 100+ tables, reducing onboarding time for new users and improving data accessibility.
● Developed pipelines supporting machine learning workflows, delivering feature-ready datasets for 10+ models used in production and experimentation.
ETL & Data Engineer Cognizant Bengaluru, India September 2020 – July 2022
● Engineered high-volume ETL pipelines using Azure Data Factory, Databricks, and Snowflake, processing 10M+ records daily across 10+ source systems.
● Designed scalable data warehouse architectures and optimized dimensional models supporting 20+ reporting dashboards and analytics workloads.
● Optimized distributed transformations using PySpark, SQL, and Hive, handling multi-GB datasets per job across clustered environments.
● Implemented metadata-driven pipeline designs covering 80+ datasets, improving reusability and maintainability of workflows.
● Integrated and processed data using cloud platforms and big data frameworks, enabling batch pipelines with hourly and daily refresh cycles.
● Queried and validated data from Oracle source systems for ETL processing and data quality checks.
● Coordinated workflow orchestration and maintained reliability across 30+ production jobs, ensuring consistent data availability.
● Provisioned curated datasets supporting 15+ BI dashboards, enabling business reporting and analysis.
● Documented data flows, transformations, and schema definitions for 50+ pipelines, improving collaboration and knowledge sharing. Data Engineer Intern Cognizant Bengaluru, India January 2020 – August 2020
● Contributed to ETL workflows populating 10+ relational and dimensional models for reporting systems.
● Authored optimized SQL queries for data extraction, validation, and transformation across datasets exceeding 1M+ records.
● Analyzed data quality issues across 5+ integrated data sources, improving dataset consistency.
● Compiled documentation for 15+ data models and workflows, supporting maintainability and onboarding. PROJECTS
Generative AI for Business Insights and Demand Forecasting Python, SQL, ETL, GCP (BigQuery), Generative AI (GPT), Power BI/Tableau
● Built ETL pipelines to consolidate and transform 500,000+ business transactions using Python, SQL, and GCP BigQuery, ensuring data integrity and delivering actionable analytics.
● Leveraged Generative AI to generate insights, identify anomalies, and forecast demand via interactive dashboards, driving data-driven decisions. Cloud-Based Real-Time Fraud Detection & Analytics Platform SQL, Python, Kafka, PySpark, Tableau, Data Analytics, Fraud Detection
● Built a real-time streaming pipeline using Kafka and PySpark to process 100,000+ simulated daily transactions and detect fraud patterns.
● Developed rule-based fraud detection logic and interactive Tableau dashboards with KPIs (fraud rate, transaction trends) for real-time monitoring. TECHNICAL SKILLS
Languages: Python, SQL, PySpark
Data Engineering: ETL/ELT Pipelines, Data Modeling (Star/Snowflake), Data Warehousing, OLAP Systems, Data Validation, Distributed Processing Big Data: Spark, Hadoop (Hive, HDFS), Kafka
Cloud: AWS (S3, Redshift), Azure (ADF, ADLS, Databricks), GCP (BigQuery) Databases: Snowflake, BigQuery, Oracle
Tools: Airflow, Terraform, Docker, Git
BI Tools: Power BI, Tableau
CERTIFICATIONS
Microsoft Certified: Azure Fundamentals (AZ-900) Databricks Academy: Data Engineering with Databricks EDUCATION
Southeast Missouri State University Cape Girardeau, MO August 2022 – December 2023 Master of Science in Computer Science GPA: 3.8/4.0