AKHILA GANGA
Data Engineer ️ *************@*****.*** +1-203-***-**** LinkedIn Github
SUMMARY
Highly skilled Data Engineer with 3+ years of hands-on experience in designing, developing, and optimizing scalable data pipelines, modern data lake house architectures, and cloud-based analytics platforms. Proven expertise in implementing real-time and batch data solutions using Apache Spark, Kafka, DBT, and comprehensive services across AWS, Azure, and GCP. Adept at translating complex business requirements into robust, efficient, and maintainable data solutions that drive critical insights and operational efficiency.
TECHNICAL SKILLS
Programming & Query Languages: Python, SQL, Shell Scripting, PL/SQL
Cloud Platforms: AWS, Azure, GCP Big Query
Big Data & Processing: Apache Spark, Databricks, Delta Lake, Hadoop, Apache Storm, Apache Druid
Data Warehousing & Storage: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse
Data Pipelines & ETL/ELT: DBT, Apache Airflow, AWS Glue, Terraform, SSIS
Streaming & Messaging: Apache Kafka, Spark Streaming
Business Intelligence & Visualization: Power BI, Tableau, SSRS, Crystal Reports, OBIEE
DevOps & IaC: Terraform, Docker, Kubernetes, GitHub Actions, Jenkins
Data Governance: RBAC, Data Masking, Audit Logging, HIPAA/SOX Compliance
PROFESSIONAL EXPERIENCE
Data Engineer Goldman Sachs – New York, NY July 2024 – Present
Engineered and optimized real-time data ingestion pipelines using Spark Streaming APIs and Kafka for near real-time data processing, persisting transformed data into GCP BigQuery for immediate consumption.
Developed and maintained robust AWS Glue ETL scripts for complex data transformations and cleansing of large-scale datasets, ensuring data quality and integrity.
Managed end-to-end ETL pipelines for data warehouses and financial reporting using SQL and Snowflake, contributing to critical business insights.
Delivered multi-cloud data integration solutions combining AWS, Azure, and GCP services, enabling seamless cross-platform analytics and reducing latency for business-critical workloads.
Optimized PySpark jobs on Kubernetes clusters for accelerated data processing, significantly reducing execution times for large datasets.
Designed and deployed Tableau dashboards to translate complex financial datasets into actionable business visualizations for stakeholders, improving reporting efficiency by 25%.
Configured and managed Apache Airflow for S3 and Snowflake data warehouses, developing DAGs for automated workflow orchestration and dependency management.
Implemented caching mechanisms and query optimizations in Druid to minimize API response times when querying large datasets, enhancing application performance.
Executed a one-time multi-state data migration from SQL Server to Snowflake using Python and SQL, ensuring data accuracy and completeness.
Expert in ETL/ELT orchestration using Apache Airflow, AWS Glue, DBT, SSIS, and WhereScape RED, ensuring automated, reliable, and scalable data workflows.
Designed and implemented scalable data processing solutions on Google Compute Engine, leveraging its virtual machine infrastructure for efficient data transformation and analysis.
Performed data ingestion and modeling using Oracle Analytics Services (OAS) and optimized OAS Dashboards for improved performance.
Implemented automated unit tests for PySpark ETL pipelines using Python (pytest) to validate transformations and ensure data accuracy across large-scale datasets.
Contributed to ETL migration services by developing and deploying AWS Lambda functions for serverless data pipelines, integrating with Glue Catalog and Athena.
Developed a custom introductory Power BI course and taught over 100 peers, significantly enhancing company-wide data literacy.
Data Engineer Ascent Health, Mumbai, India May 2021 – July 2023
Developed complex T-SQL stored procedures, functions, and triggers to implement business logic and automate critical data processes.
Created interactive Tableau dashboards alongside SSRS reports, providing executives with self-service analytics for healthcare operations.
Designed and developed SSIS packages for robust ETL operations, extracting and loading data from diverse flat file sources into SQL Server databases.
Automated ETL jobs using SQL Server Agent, including scheduling, managing, and configuring alerts for job statuses and failures, ensuring operational continuity.
Enforced real-time monitoring and reporting dashboards for end-to-end pipeline visibility and operational transparency, enabling proactive issue resolution.
Developed complex reports using SSRS, including parameterized, sub-reports, drill-down, and summary reports, supporting data-driven decision-making.
Created and enhanced database objects (tables, indexes, views) and designed constraints to ensure data integrity across various systems.
Collaborated with cross-functional teams to define requirements for seamless data exchange between multiple systems.
Deployed SSIS packages and reports from development to production servers, ensuring smooth transitions and minimal downtime.
EDUCATION
Master of Computer Science & Information Technology Sacred Heart University GPA: 3.8/4
Integrated M. Tech in Software Engineering Vellore Institute of Technology
CERTIFICATIONS
Microsoft Certified: Azure Developer Associate
AWS Certified Data Analytics - Specialty
ACADEMIC PROJECT
Data-Driven Health Insurance Management: Unifying Insights for Optimal Care
Objective: Implemented a comprehensive Business Intelligence (BI) solution using SSIS, Azure Data Factory, and Power BI to centralize and analyze health insurance data, enabling data-driven decision-making and strategic planning.
Key Contributions & Technologies:
Designed and implemented end-to-end ETL pipelines using SSIS and Azure Data Factory to ingest, clean, and transform health insurance data from Excel and on-prem SQL Server to Azure SQL Database.
Leveraged DBT for data transformation and modeling, creating a maintainable and version-controlled analytics layer for key performance indicators (KPIs) and business metrics.
Developed Azure Data Factory pipelines using lookup, filter, and copy activities to automate the transfer of dimension and fact data from on-premises SQL Server to Azure SQL Database.
Built and deployed SSIS packages to extract data from Excel sources and load into MS-SQL Server, ensuring data accuracy through truncation and validation mechanisms.
Created interactive dashboards using Power BI to visualize trends in claims, costs, and demographics, delivering insights for clinical decision-making and operational efficiency.
Implemented data quality checks and validation processes within the pipelines to ensure accuracy and reliability of the consolidated data.
Technologies Used: Azure Data Factory (ADF), SSIS, MS-SQL Server, Azure SQL Database, Power BI Desktop, Git, Excel