AKHIL NALABOLU
Email: ************@*****.*** Phone: +1-214-***-****
PROFESSIONAL SUMMARY
Results-driven Data Engineer with 4+ years of experience designing and optimizing cloud-based big data pipelines across AWS, Azure, and GCP. Expert in ETL/ELT workflows, real-time streaming architectures, and data lakehouse solutions using Apache Spark, Kafka, Snowflake, and Databricks. Skilled in Python, SQL, and PySpark for large-scale transformations and advanced analytics. Strong background in machine learning integration, orchestration, and performance tuning, delivering up to 30% faster processing speeds and enabling real-time, data- driven decision-making. Proven track record of cloud migrations, cost optimization, and compliance with GDPR and CCPA standards. TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, Hive, Pig, YARN, Spark, Spark SQL, MapReduce, Kafka, Sqoop, Delta Lake, Iceberg
Data Processing & Analytics: Apache Spark, Spark MLlib, Spark Streaming, Spark GraphX, dbt (Data Build Tool), Apache Flink
Programming Languages: Python, Scala, SQL, PL/SQL, UNIX Shell Scripting, PySpark
Databases (RDBMS & NoSQL): Teradata, Oracle, DB2, SQL Server, MySQL, MongoDB, Cassandra, HBase, Elasticsearch
Cloud Databases: Amazon Redshift, AWS Snowflake, PostgreSQL, Google BigQuery, Azure Synapse Analytics
Cloud Platforms: AWS (EC2, S3, Glue, Lambda, IAM), Azure
(Data Factory, Databricks), GCP (BigQuery, Dataflow)
ETL & ELT Tools: IBM InfoSphere, SQL Server Integration Services (SSIS), Apache Sqoop, Fivetran, Matillion
Workflow Orchestration: Apache Airflow, AWS Glue, Azure Data Factory, Dagster
Machine Learning Models: Linear/Logistic Regression, Naïve Bayes, Decision Trees, Random Forest, KNN, SVM, Gradient Boosting, PCA, LDA, Time Series Analysis
Deep Learning & NLP: TensorFlow, Keras, CNN, RNN, NLP
(SpaCy, Gensim), Hugging Face Transformers
Visualization & BI Tools: Tableau, Power BI, Matplotlib, Seaborn, Looker
Development Environments: PyCharm, Jupyter Notebook, Visual Studio, VS Code
Version Control & CI/CD: Git, GitHub, JIRA, GitLab, Jenkins, Bitbucket Pipelines
Containerization & Deployment: Docker, Kubernetes, Terraform
Operating Systems: Linux, Unix, Windows, Mac O
PROFESSIONAL EXPERIENCE
Goldman Sachs Global Jan 2024 – Present
Data Engineer
Designed and deployed scalable ETL/ELT pipelines ingesting multi-terabyte datasets into Snowflake and Redshift, ensuring 99.9% data accuracy.
Built real-time streaming solutions using Apache Kafka and Spark Streaming, processing 2M+ daily transactions for fraud detection and anomaly monitoring.
Automated complex workflows in Apache Airflow and Azure Data Factory, reducing operational overhead by 40%.
Developed ML pipelines in Spark MLlib and TensorFlow, improving predictive model accuracy for customer churn by 12%.
Leveraged Azure Databricks Delta Lake to handle slowly changing dimensions, improving historical accuracy in BI reporting.
Implemented dbt transformation layers in Snowflake, enabling self-service analytics for 20+ business teams.
Created CI/CD automation via Jenkins and Git, reducing deployment cycles by 25% and introducing automated testing gates.
Built real-time Power BI dashboards that aggregated KPIs across business lines, cutting decision-making time from hours to minutes.
Partnered with InfoSec to integrate GDPR/CCPA compliance measures in pipelines, reducing audit risk by 100% pass rate.
Optimized Spark cluster resource allocation, reducing compute costs by 20% through intelligent job parallelization. Coforge Aug 2019 – Jul 2022
Data Engineer
Engineered AWS Redshift pipelines to integrate data from ERP, CRM, and streaming sources, improving BI data availability by 35%.
Designed and optimized T-SQL stored procedures and partitioned tables to handle billions of records efficiently.
Built data lakes on AWS S3 and Azure Data Lake, enabling scalable storage for unstructured, semi-structured, and structured datasets.
Integrated Apache Sqoop to migrate high-volume data from on-premises Oracle DB to cloud storage.
Collaborated with data scientists to operationalize ML models in Spark MLlib, reducing fraudulent transaction rates by 15%.
Led on-prem to AWS migration of 50+ data pipelines, achieving a 20% cost reduction and better scalability.
Tuned Spark jobs using Dynamic Resource Allocation, reducing job runtimes by 30%.
Implemented Airflow DAG monitoring with automated alerts, improving incident response time by 45%.
Established version-controlled SQL and ETL scripts in Git for better change tracking and rollback.
Configured AWS IAM roles and encryption to meet enterprise security standards and compliance frameworks. EDUCATION
Master of Science in Computer Science – University of Central Missouri, USA
Bachelor of Technology in Computer Science – Vignan’s Institute of Science and Technology, India PROJECT HIGHLIGHTS
Real-Time Fraud Detection Platform
Architected a Kafka-Spark Streaming pipeline processing 500K+ transactions/hour with ML-driven risk scoring in under 2 seconds. Cloud Data Lake Modernization
Migrated 15 TB of structured and unstructured data to AWS S3 with Glue crawlers, reducing retrieval latency by 40%.