SRAVANI AKKIRAJU
********.***********@*****.***
https://www.linkedin.com/in/sravani-akkiraju-662b9422b/ PROFESSIONAL SYNOPSIS:
v Performance-driven IT professional with 7+ years of experience as a Data Engineer, specializing in building and optimizing scalable data pipelines across diverse cloud platforms like AWS and Azure. Proficient in leveraging agile methodologies to streamline workflows, enhance data delivery, and drive continuous improvement. Expertise in big data technologies, ETL processes, and data warehousing solutions ensures robust, secure, and efficient data infrastructure. Adept at automating complex processes and delivering actionable insights that empower data-driven decision-making and business growth. v Programming Languages and Scripting
§ Expertise in a wide range of programming languages and tools including Python, R, Java, JavaScript, SQL, Tableau, and Power BI.
§ Holds a master’s degree in data science with advanced skills in SQL, Azure and AWS cloud services, data transformation, and analytics tools.
§ Skilled in managing complex data sets that meet both functional and non-functional business requirements across various industries.
v Data Engineering and ETL Tools
§ Experience as a Data Engineer specialist in developing scalable data pipelines and optimizing data flow across SQL and Azure platforms.
§ Built scalable ETL infrastructure using AWS to improve processing times and data extraction efficiency.
§ Maintained optimal data pipeline architectures, improving data flow efficiency, and automating data processing tasks.
v Databases and Data Warehousing
§ Familiar with a variety of databases like Snowflake, Oracle, SQL Server, MySQL, and NoSQL.
§ Managed metadata repositories to streamline data transformation and workload management.
§ Facilitated the integration of new data sources to expand data functionality and analytical capabilities. v Cloud Platforms and Big Data Technologies
§ Experienced in using big data technologies such as Hadoop, Kafka, and Spark, as well as cloud platforms like AWS and Azure.
§ Led the redevelopment of data infrastructure, achieving significant improvements in scalability and data volume handling.
§ Collaborated effectively with executive and product teams to address technical issues and enhance data infrastructure support.
v Data Analytics and Visualization
§ Proficient in performing root cause analysis and building analytics tools using Power BI that contribute to business performance metrics.
§ Developed analytics tools leveraging data pipelines to enhance insights into customer behavior and operational efficiency using Tableau.
§ Engineered advanced analytics tools using Power BI to provide actionable business insights and support customer acquisition strategies.
v DevOps and CI/CD Practices
§ Expertise in automating manual processes, enhancing data delivery, and redesigning infrastructure for improved scalability.
§ Implemented stringent security measures to ensure data protection and compliance with industry standards.
§ Adept at using project management tools like JIRA, emphasizing agile methodologies to enhance workflow efficiency.
ACADEMIC BACKGROUND:
Master’s in Data Science Aug 2021- Dec 2022
University of North Texas
Bachelor’s in Electrical and Electronics Engineering Sep 2014 - May 2018 CVR College of Engineering
ACCREDATIONS:
§ Amazon Web Services Associate Certified.
§ Tableau Desktop Certified Desktop Specialist.
§ Azure Data Engineer Associate Certified.
§ Google Analytics Certified Professional
TECHNICAL EXPERTISE:
Operating Systems Unix, Linux
Methodologies Agile - Scrum, JIRA, Rally, Octane
Programming Languages and
Scripting
Python, SQL, C++, Java, HTML5
Data Engineering and ETL Tools Big Data Ecosystem
• Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Cassandra, Kubernetes, Databricks, Apache Nifi, Talend
Spark Streaming Technologies
• Spark, PySpark, Spark Structured Streaming, Kafka, Storm, Kinesis Tools AWS Tools
• EC2, S3 Bucket, AMI, RDS, Redshift, Glue, QuickSight, SageMaker, Lambda, CloudWatch, CloudTrail, ELB, CLI, Data Pipeline, Cloud Shell SDK, Databricks
Azure Tools
• Azure Data Factory, Azure Monitor
GCP Tools
• GCP, Big Query
Other Tools
• Apache Airflow, Jupyter
Databases and Data
Warehousing
Data Warehouses
• Teradata, Big Query
Databases
• DynamoDB, NoSQL, Oracle, Postgres, MySQL, SQL Server, SAP, DB2, Snowflake
Data Visualization Tableau, Power BI
Machine Learning Skills Data Visualization, Feature Extraction, Dimensionality Reduction, Model Evaluation, K-means, Regression Models, Decision Trees DevOps and CI/CD Practices Jenkins, Docker, Kubernetes, Git, CI/CD Pipelines Automation Testing Tools Python’s built-in unit test framework PROFESSIONAL ACCOMPLISHMENTS:
LPL Financials, TX Nov 2021 - PRESENT
Data Engineer
Responsibilities:
§ Azure Data Factory for Initial ETL Workflows: Designed and implemented ETL pipelines using Azure Data Factory to extract and transform raw data, setting the foundation for migration to GCP Big Query.
§ Data Lake Optimization in Azure: Enhanced Azure Data Lake Storage performance for staging and pre- processing large datasets before transferring them to GCP.
§ Change Data Capture (CDC) in Azure: Developed incremental data load workflows in Azure Data Factory, ensuring consistent data replication during migration to GCP.
§ Azure Synapse Analytics for Data Transformation: Performed advanced data transformations in Azure Synapse Analytics, preparing datasets for compatibility with GCP's analytics tools.
§ Snowflake Optimization for Financial Data Processing: Designed and optimized Snowflake data models and queries to enhance data retrieval performance for financial reporting and analytics. Improved processing efficiency by 25% through query optimization and storage tuning.
§ Snowflake Integration with Azure and GCP: Developed seamless data integration pipelines between Azure Data Factory, Snowflake, and GCP BigQuery, enabling cross-platform data consistency and real-time financial insights.
§ PySpark for Data Processing on Azure Kubernetes: Leveraged PySpark to perform large-scale data transformations and processing within Azure Kubernetes Service (AKS), optimizing workloads for financial datasets.
§ PySpark for ETL Automation: Automated ETL pipelines using PySpark, reducing data ingestion latency and improving the efficiency of financial data migration to GCP BigQuery.
§ Azure Kubernetes for PySpark Optimization: Deployed PySpark jobs on Azure Kubernetes Service (AKS) to process and clean data efficiently before transferring to GCP for further analysis.
§ Integration with GCP Services: Built API connectors integrating Google Cloud Storage, Pub/Sub, and Big Query, facilitating real-time data flow and analytics in a cloud-native environment.
§ Automated API Data Pipelines: Developed GCP Cloud Functions to automate API calls, ensuring continuous data ingestion and processing from external financial data sources.
§ Backend API Development for Data Processing: Engineered backend APIs using Python and FastAPI for seamless integration with GCP Big Query, enabling efficient data retrieval and transformation processes.
§ Azure DevOps for Multi-Cloud CI/CD Pipelines: Established Azure DevOps pipelines to orchestrate and automate the deployment of migration scripts between Azure and GCP environments.
§ Azure Event Hub for Data Streaming: Streamlined real-time data ingestion with Azure Event Hub, enabling data replication to GCP Pub/Sub for concurrent processing.
§ Cross-Cloud Data Transfer Using Azure Blob Storage: Configured Azure Blob Storage for staging and exporting data to GCP Cloud Storage, ensuring reliable cross-cloud connectivity.
§ Azure Monitor for Migration Reliability: Implemented Azure Monitor to track migration pipeline performance, ensuring data consistency and timely issue resolution.
§ Azure SQL Database Optimization: Migrated structured datasets from Azure SQL Database to GCP Big Query, ensuring schema alignment and query optimization for analytics.
§ Hybrid Analytics with Azure Synapse and GCP Big Query: Enabled federated querying by integrating Azure Synapse with GCP Big Query for hybrid analytics across both cloud platforms.
§ Azure Logic Apps for Workflow Automation: Automated recurring tasks in the migration process using Azure Logic Apps, reducing manual intervention during data movement to GCP.
§ Azure Cost Management During Migration: Monitored and optimized Azure resource usage during the migration process, achieving a 20% reduction in operational costs.
§ Post-Migration Integration with GCP: Conducted post-migration validation using Azure Data Factory and GCP Dataflow to ensure data integrity and synchronization across both platforms. Environment: Azure Data Factory, Azure Synapse Analytics, Azure Blob Storage, Azure SQL Database, Azure Monitor, GCP Big Query, Google Cloud Storage, Google Pub/Sub, Snowflake, SAP HANA, Python, SQL Copart IT Solutions, Hyderabad, India Jan 2020 - June 2021 Data Engineer
Responsibilities:
§ GCP-Based ETL Processes: Used Google Cloud Dataflow to design and build ETL processes that optimized data intake and transformation across many GCP services.
§ GCP Cloud Functions for Serverless Architecture: Leveraging Google Cloud Functions and API Gateway, a serverless architecture was developed to provide scalability and flexibility.
§ Data Ingestion into GCP Big Query: Managed Big Query data intake workflows, increasing the effectiveness of data analysis and reporting.
§ GCP Dataflow for Data Transformation: Streamlined processing workflows by transforming, verifying, and cleansing incoming data with Google Cloud Dataflow.
§ Change Data Capture on GCP: Created CDC approaches that are both synchronous and asynchronous and use Google Pub/Sub to provide real-time data replication.
§ Real-Time Data Pipeline with GCP: Employing Google Cloud Pub/Sub and Apache Kafka, a real-time data pipeline was established to manage semi-structured data.
§ Automated GCP Databricks Notebooks: Reduced manual labor with automated Databricks notebooks for analytics and data processing on GCP utilizing Python and SQL.
§ Snowflake Data Warehousing for Automotive Analytics: Designed and managed Snowflake as a centralized data warehouse for automotive analytics, ensuring high-performance query execution for large-scale datasets.
§ Snowflake-Based Reporting and Data Processing: Implemented Snowflake-based ETL processes, reducing data processing time by 30% and streamlining real-time analytics for vehicle inventory and transaction data.
§ PySpark for Large-Scale Data Processing: Developed PySpark-based workflows to process structured and semi-structured vehicle data, ensuring optimized transformations and real-time analytics for fleet management.
§ PySpark for Machine Learning Pipelines: Built PySpark-powered machine learning pipelines for predictive analytics, improving vehicle auction forecasting and inventory predictions.
§ Datorama Utilization: Used Datorama to integrate marketing data from various platforms, providing real- time visibility into campaign performance and allowing data-driven decision-making for the marketing team.
§ GCP Storage Solutions: Supervised data backups on Google Cloud Storage and improved the storage plan for increased scalability.
§ Data Migration to Big Query: Managed the effective conversion of Oracle databases to Big Query with Google Cloud Dataflow, which produced a 20% increase in performance.
§ GCP Airflow for Workflow Automation: Created and oversaw ETL process orchestration workflows with GCP Composer (Apache Airflow).
§ Data Warehouse Architecture on GCP: Enhanced data integration and reporting capabilities by designing the architecture for a data warehouse on Big Query.
§ GCP Security Compliance: Using Cloud Key Management and Google Cloud IAM, security controls were put into place while adhering to corporate guidelines.
§ GCP Logging and Monitoring: System dependability was ensured by using Google Cloud Logging and Monitoring to track data pipelines in real-time.
§ GCP CI/CD Pipelines: Jenkins and GCP were integrated to automate the deployment of data engineering solutions through CI/CD pipelines.
§ Cost Optimization in GCP: Conducted cost-saving optimizations for GCP services, reducing overall cloud expenditure by 18%.
Environment: Google Cloud Platform (GCP), Big Query, Google Cloud Dataflow, Google Cloud Functions, Google Cloud Storage, Google Pub/Sub, GCP Composer, Databricks, Jenkins, Python, SQL Amazon, Hyderabad, India Oct 2017 - Dec 2019
Data Engineer
Responsibilities:
§ AWS-Based Data Transfer: Implementing AWS Glue and S3, data transfer pipelines were implemented to guarantee a seamless input of raw data into Amazon Redshift.
§ Real-Time Streaming with AWS Kinesis: Applying AWS Kinesis, real-time streaming applications that handle large amounts of streaming data and fault-tolerant processing were developed.
§ AWS Lambda for ETL Automation: Used AWS Lambda to automate ETL procedures, decreasing the need for human involvement and increasing the effectiveness of the data pipeline.
§ AWS CloudFormation for Infrastructure: Automated resource provisioning and scaling using AWS CloudFormation for infrastructure deployment and management.
§ Data Lake Creation in AWS: Created a data lake with AWS S3 to combine data from several sources and allow for scalable data analysis.
§ AWS Glue for Data Cataloging: Improved data structure and discovery by using AWS Glue catalog and crawlers to retrieve and catalog metadata.
§ AWS Redshift for Data Analytics: Used Amazon Redshift to do advanced analytics on big datasets and produce insights that could be use.
§ AWS EC2 for Scalable Compute Resources: Managed EC2 instances to process data at a 20% faster rate while maintaining scalability and great performance.
§ Data Cleaning and Preprocessing on AWS: Using Python and AWS Glue, data cleaning and preprocessing techniques were applied to ensure high-quality data for analysis.
§ AWS Cloud Watch for Monitoring: AWS Cloud Watch was used to monitor resource utilization and application performance, guaranteeing excellent availability and dependability.
§ Automated Batch Processing with AWS Batch: Designed AWS Batch and carry out batch processing processes, increasing efficiency for big datasets.
§ Security Management with AWS IAM: Strong security measures were put into place using AWS IAM, guaranteeing safe data access and adherence to industry norms.
§ AWS RDS for Database Management: Optimized query performance for transactional data by managing relational databases with AWS RDS.
§ Cost Optimization in AWS: Performed cost optimization techniques, sizing EC2 instances and S3 storage classes appropriately to save AWS infrastructure expenditures by 15%.
§ AWS EMR for Big Data Processing: Employing AWS EMR for distributed data processing with Spark and Hadoop resulted in 30% more work being done with 30% more efficiency. Environment: AWS Glue, AWS S3, AWS Redshift, AWS Lambda, AWS Kinesis, AWS EC2, AWS CloudWatch, AWS RDS, AWS CloudFormation, AWS Batch, AWS IAM, Python, Spark