Senior Data Engineer Cloud Data Platform Expert

Location:

Piscataway, NJ

Salary:

155000

Posted:

March 04, 2026

Contact this candidate

Resume:

Abhinay Bonthapally *******.***********@*****.*** 510-***-**** linkedin.com/in/abhinay14

SUMMARY: Results-driven Data Engineer with 9+ years of experience building scalable, cloud-na7ve data pla8orms using Databricks, Azure, AWS, SQL and Python. Expert in distributed data systems that process terabytes at scale to accelerate analy7cs and decision-making. Proven track record in op7mizing ETL pipeline, building real-7me streaming solu7ons, and delivering repor7ng-ready datasets for execu7ve dashboard. CERTIFICATIONS

- AWS Cer7ﬁed Developer

TECHNICAL SKILLS

- Cloud/Data PlaGorms: Databricks, Azure Data Factory, AWS (S3, RedshiM, EC2), Snowﬂake, Dataiku, Ma7llion.

- Big Data & ETL: Apache Spark, Talend, Hive, RedshiM Spectrum.

- Programming Languages: SQL, Python (PySpark, Pandas), Scala, Shell Scrip7ng.

- Databases: Snowﬂake, RedshiM, Hive, BigQuery, Delta Lake, MongoDB, MySQL NoSQL.

- Data Architecture: Star/Snowﬂake Schema design, Dimensional Modeling, Data Lake design.

- Data Warehousing: Dimensional Modeling, Slowly Changing Dimensions (SCDs), RedshiM, MySQL.

- Streaming: Spark Structured Streaming.

- Business Intelligence/VisualizaPon: Tableau, Power BI.

- AutomaPon & DevOps: Apache Airﬂow, CI/CD, Git, Bitbucket, Bash, Linux, Docker, Kubernetes.

- Machine Learning: TensorFlow, PyTorch.

PROFESSIONAL EXPERIENCE

Data Engineer – Manager, Axtria Inc, New Jersey March 2017 -Present Tech Stack: Databricks, SQL, Python, Apache Spark, Pandas, Airﬂow, AWS, MaPllion, Dataiku

- Build and op7mized scalable Databricks data pipelines processing over 5 TB/day, including cluster tuning and migra7ng legacy ETL to Pyspark notebooks, reducing processing 7me by 40%.

- Designed real-7me streaming pipelines on Databricks using Structured Streaming and Delta Lake, enabling sub-minute latency analy7cs for key opera7onal dashboards.

- Integrated Azure Data Factory with Databricks for Orchestra7ng complex data workﬂows across batch and real-7me use cases.

- Implemented a uniﬁed data lake architecture using Azure Data Lake, suppor7ng both real-7me and batch processing needs.

- Automated data inges7on into RedshiM from S3 using Python and Talend, achieving 99.9% SLA up7me and seamless pipeline reliability.

- Authored and op7mized 100+ complex SQL queries involving window func7ons, CTEs, and joins, reducing dashboard latency by 3x and powering execu7ve reports.

- Enhanced long-running RedshiM queries, achieving up to 5x performance gains by redesigning schema, and implemen7ng sort/dist keys and data distribu7on strategies.

- Diagnosed and resolved cri7cal performance issues in large-scale SQL transforma7ons (over 2B rows), improving SLA compliance by 40%.

- Developed modular PySpark job frameworks with reusable components for inges7on, transforma7on, and audit logging, improving developer velocity and code maintainability.

- Built op7mized, repor7ng-ready datasets Tableau and Power BI, accelera7ng KPI delivery and execu7ve decision-making.

- Led cloud cost op7miza7on ini7a7ves across Azure and AWS by implemen7ng autoscaling, par77on strategies and query tuning.

- Mentored a cross-func7onal team of 5 engineers, driving engineering excellence and modern cloud-ﬁrst prac7ce.

- Automated end-to-end data workﬂows in Dataiku, boos7ng cross-team collabora7on and reducing manual processing 7me by 30%. ETL Big Data Developer – 1 Teq, Wisconsin September 2016 – February 2017 Tech Stack: Talend, AWS – S3, Redshic, HDFS, Oracle, Hortonworks

- Designed ETL pipelines using Talend to load data into RedshiM and AWS S3 from diverse sources (Oracle, HDFS).

- Developed reusable components leveraging Talend’s AWS connectors: tS3Put, tRedshiMOutput, tHiveInput.

- Enhanced job parallelism and error handling, reducing average ETL dura7on by 25%. Programmer Analyst – Saras Inc, California May 2015 – September 2016 Tech Stack: Talend, Oracle

- Built ETL jobs to support warehouse architecture with SCD Type 2 and fact/dimension models.

- Worked on data pipeline development from Oracle/ﬂat ﬁles to SQL server with performance tuning for large datasets.

- Par7cipated in data modeling and schema design for analy7cs and repor7ng system. EDUCATION:

Master of Science in Computer Science, Northwestern Polytechnic University, California. April 2014 - April 2015 Bachelor of Technology in Computer Science, Malla Reddy Engineering College, Hyderabad, India. July 2009 – May 2013

Contact this candidate