Data Engineer ETL or ELT Developer Big Data Engineer Cloud DE

Location:

United States

Posted:

September 18, 2025

Contact this candidate

Resume:

SUSHMA PALANCHA

Data Engineer

****************@*****.*** +1-330-***-**** LinkedIn

SUMMARY

Analytical and results-oriented Data Engineer with 3+ years of experience in designing large-scale data pipelines, building cloud-native ETL solutions, and supporting AI/ML initiatives. Skilled in developing modular, high-performance data frameworks that reduce latency, improve reliability, and increase insight velocity across business units. Experienced in translating business problems into technical solutions using a diverse toolset across Python, SQL, Spark, AWS, Azure, and Databricks. Adept at working across teams including analysts, data scientists, and business stakeholders to deliver measurable outcomes. Strong advocate for automation, documentation, and data governance. PROFESSIONAL EXPERIENCE

Data Engineer, Databricks 07/2024 – Present USA

•Reduced data latency by 3.2 hours daily by engineering scalable and reusable ETL pipelines using PySpark and DBT to process over 50 million records per day in the Databricks Lakehouse architecture.

•Improved near-real-time analytics by building a Kafka + Delta Live Table streaming ingestion pipeline, increasing data availability by 90 minutes/day and enabling operational dashboards to refresh every 5 minutes.

•Automated documentation workflows using Hugging Face and LangChain to summarize and tag metadata across data pipelines, saving analysts and data stewards an estimated 12 hours per week.

•Enabled seamless cross-cloud integration by connecting data from AWS, Azure, and GCP using REST APIs orchestrated through Apache Airflow, reducing deployment and onboarding time for new data sources by 0%.

•Accelerated ML model training by 62% by collaborating with data scientists to deliver optimized, scalable feature engineering pipelines that reduced model prep time from hours to under 90 minutes.

•Implemented end-to-end data quality checks using Unity Catalog and Great Expectations, significantly improving audit traceability and compliance reporting for four business-critical systems. Saved • $36,000 in annual cloud costs by tuning Spark cluster usage, auto-scaling configurations, and refactoring redundant compute jobs.

Data Engineer, Cognizant 07/2021 – 07/2023 India

•Ingested and processed over 200GB/day from Salesforce, Oracle ERP, and e-commerce platforms using Spark and SQL, supporting downstream analytics for multiple departments.

•Reduced infrastructure costs by $8,000/year by migrating legacy batch processes to serverless architecture using AWS Glue, S3, and Azure Data Factory.

•Improved data quality and reliability by building a modular validation framework that included data profiling, null- checks, outlier detection, and schema consistency across ingestion pipelines.

•Created interactive dashboards in Power BI and Tableau for operations, finance, and marketing teams, enabling self- service analytics and reducing report turnaround times from 3 days to a few hours.

•Designed Kafka-based ingestion for customer feedback, transforming unstructured inputs into structured insights with real-time sentiment scoring displayed on executive dashboards.

•Optimized legacy SQL scripts and Spark transformations, bringing average job execution time down from 5 minutes to 18 minutes—boosting daily SLA compliance rates.

•Contributed to Agile delivery cycles, completing over 20 sprint stories on time, leading daily stand-ups, and collaborating closely with 6+ cross-functional team members including PMs, testers, and architects. EDUCATION

Master of Science in Business Analytics, Kent State University 12/2024 Ohio, USA Relevant Coursework: Data Warehousing, Machine Learning, Predictive Analytics, Cloud Computing Bachelor of Science in Mathematics, Statistics & Computer Science, Osmania University

07/2019 – 08/2022 Hyderabad, India

CORE SKILLS & TOOLS

Languages

Python, SQL, R

ETL & Big Data Frameworks

Apache Spark, PySpark, Databricks, Airflow, Kafka, Hadoop, Informatica

Databases & Warehousing

Snowflake, Redshift, SQL Server, MySQL, PostgreSQL, MongoDB

Other Tools

Git, GitHub, JIRA, Confluence, Postman, REST APIs, Google Analytics, GTM

Data Ops

Data Modeling, Data Validation (Great Expectations), Unity Catalog, CI/CD

Cloud Platforms

AWS (S3, Glue, Redshift, Lambda), Azure (Data Lake, Synapse), GCP (BigQuery)

AI/ML & NLP Tools

Scikit-learn, Hugging Face, LangChain, TensorFlow, XGBoost, SHAP, Feature Engineering

BI & Visualization

Power BI (DAX), Tableau, Looker, Excel (VLOOKUP, Pivot Tables), Plotly, Seaborn

Project Methodologies

Agile (Scrum, Kanban), Waterfall

PROJECTS

Amazon Stock Price Prediction 05/2024 – 08/2024

•Developed time series forecasting models (LSTM, SVR, Linear Regression) using 3,500+ days of Amazon stock data to predict future price trends.

•Applied moving average smoothing, MinMaxScaler normalization, and used a sliding window mechanism to prepare input sequences.

•Achieved R of 0.9995 and RMSE of 0.0076, demonstrating the model’s ability to capture long-term trends and short- term volatility.

Customer Churn Prediction with Explainable AI 01/2024 – 04/2024

•Built churn prediction models on telecom customer data using XGBoost, Logistic Regression, and Decision Trees.

•Integrated SHAP for model interpretability to explain churn drivers such as contract type, service usage, and billing.

•Deployed insights through an interactive Tableau dashboard to support the customer retention team. Retail Sales Forecasting 07/2023 – 10/2023

•Analyzed multi-store, multi-product retail data and implemented ARIMA and Prophet models for weekly demand forecasting.

•Captured seasonal patterns, holidays, and promotional periods to improve sales predictability for top SKUs.

•Created interactive Power BI reports with model confidence intervals and scenario projections. CERTIFICATIONS

Microsoft Azure Fundamentals

(AZ-900)

Deep Learning in Python –

DataCamp

Advanced Time Series

Forecasting – DataCamp

AWS: Data Analytics

Fundamentals – LinkedIn Learning

Image Modeling with Keras –

DataCamp

Introduction to Python –

DataCamp

Power BI Essential Training –

LinkedIn Learning

Introduction to TensorFlow –

DataCamp

Contact this candidate