Abhinay Bonthapally *******.***********@*****.*** 510-***-**** linkedin.com/in/abhinay14
SUMMARY: Results-driven Data Engineer with 9+ years of experience building scalable, cloud-na7ve data pla8orms using Databricks, Azure, AWS, SQL and Python. Expert in distributed data systems that process terabytes at scale to accelerate analy7cs and decision-making. Proven track record in op7mizing ETL pipeline, building real-7me streaming solu7ons, and delivering repor7ng-ready datasets for execu7ve dashboard. CERTIFICATIONS
- AWS Cer7fied Developer
TECHNICAL SKILLS
- Cloud/Data PlaGorms: Databricks, Azure Data Factory, AWS (S3, RedshiM, EC2), Snowflake, Dataiku, Ma7llion.
- Big Data & ETL: Apache Spark, Talend, Hive, RedshiM Spectrum.
- Programming Languages: SQL, Python (PySpark, Pandas), Scala, Shell Scrip7ng.
- Databases: Snowflake, RedshiM, Hive, BigQuery, Delta Lake, MongoDB, MySQL NoSQL.
- Data Architecture: Star/Snowflake Schema design, Dimensional Modeling, Data Lake design.
- Data Warehousing: Dimensional Modeling, Slowly Changing Dimensions (SCDs), RedshiM, MySQL.
- Streaming: Spark Structured Streaming.
- Business Intelligence/VisualizaPon: Tableau, Power BI.
- AutomaPon & DevOps: Apache Airflow, CI/CD, Git, Bitbucket, Bash, Linux, Docker, Kubernetes.
- Machine Learning: TensorFlow, PyTorch.
PROFESSIONAL EXPERIENCE
Data Engineer – Manager, Axtria Inc, New Jersey March 2017 -Present Tech Stack: Databricks, SQL, Python, Apache Spark, Pandas, Airflow, AWS, MaPllion, Dataiku
- Build and op7mized scalable Databricks data pipelines processing over 5 TB/day, including cluster tuning and migra7ng legacy ETL to Pyspark notebooks, reducing processing 7me by 40%.
- Designed real-7me streaming pipelines on Databricks using Structured Streaming and Delta Lake, enabling sub-minute latency analy7cs for key opera7onal dashboards.
- Integrated Azure Data Factory with Databricks for Orchestra7ng complex data workflows across batch and real-7me use cases.
- Implemented a unified data lake architecture using Azure Data Lake, suppor7ng both real-7me and batch processing needs.
- Automated data inges7on into RedshiM from S3 using Python and Talend, achieving 99.9% SLA up7me and seamless pipeline reliability.
- Authored and op7mized 100+ complex SQL queries involving window func7ons, CTEs, and joins, reducing dashboard latency by 3x and powering execu7ve reports.
- Enhanced long-running RedshiM queries, achieving up to 5x performance gains by redesigning schema, and implemen7ng sort/dist keys and data distribu7on strategies.
- Diagnosed and resolved cri7cal performance issues in large-scale SQL transforma7ons (over 2B rows), improving SLA compliance by 40%.
- Developed modular PySpark job frameworks with reusable components for inges7on, transforma7on, and audit logging, improving developer velocity and code maintainability.
- Built op7mized, repor7ng-ready datasets Tableau and Power BI, accelera7ng KPI delivery and execu7ve decision-making.
- Led cloud cost op7miza7on ini7a7ves across Azure and AWS by implemen7ng autoscaling, par77on strategies and query tuning.
- Mentored a cross-func7onal team of 5 engineers, driving engineering excellence and modern cloud-first prac7ce.
- Automated end-to-end data workflows in Dataiku, boos7ng cross-team collabora7on and reducing manual processing 7me by 30%. ETL Big Data Developer – 1 Teq, Wisconsin September 2016 – February 2017 Tech Stack: Talend, AWS – S3, Redshic, HDFS, Oracle, Hortonworks
- Designed ETL pipelines using Talend to load data into RedshiM and AWS S3 from diverse sources (Oracle, HDFS).
- Developed reusable components leveraging Talend’s AWS connectors: tS3Put, tRedshiMOutput, tHiveInput.
- Enhanced job parallelism and error handling, reducing average ETL dura7on by 25%. Programmer Analyst – Saras Inc, California May 2015 – September 2016 Tech Stack: Talend, Oracle
- Built ETL jobs to support warehouse architecture with SCD Type 2 and fact/dimension models.
- Worked on data pipeline development from Oracle/flat files to SQL server with performance tuning for large datasets.
- Par7cipated in data modeling and schema design for analy7cs and repor7ng system. EDUCATION:
Master of Science in Computer Science, Northwestern Polytechnic University, California. April 2014 - April 2015 Bachelor of Technology in Computer Science, Malla Reddy Engineering College, Hyderabad, India. July 2009 – May 2013