Data Engineer Machine Learning

Location:

Guwahati, Assam, India

Posted:

October 15, 2025

Contact this candidate

Resume:

Pavankumar Pendela

Data Engineer

*******************@*****.*** +1-313-***-**** LinkedIn Centerton, AR Professional Summary

Results-driven Data Engineer and Analyst with 5+ years of experience designing, building, and optimizing scalable data pipelines, cloud-based data warehouses, and advanced analytics solutions. Skilled in Python, SQL, Apache Spark, AWS, and Snowflake, with proven success in data migration, ETL automation, and big data processing across industries including finance, product, marketing, and sales. Adept at developing business intelligence dashboards (Power BI, Tableau), implementing data governance frameworks, and applying statistical modeling and machine learning techniques to deliver actionable insights. Recognized for improving data quality, reducing processing time by 30%, and cutting post-deployment issues by 90%, while ensuring compliance with GDPR and CCPA standards. Strong collaborator with cross-functional teams, delivering data-driven strategies that enhance decision-making, streamline operations, and drive measurable business outcomes. Professional Experience

Data Engineer Northern Trust Jan 2023 – Present

• Built and optimized real-time data pipelines using Apache Spark, Kafka, and Cassandra, enabling highly efficient ingestion, transformation, and processing of diverse streaming datasets.

• Migrated multiple on-premises legacy applications to AWS EMR and Redshift, significantly reducing infrastructure costs and greatly improving cloud scalability and system reliability.

• Designed and implemented automated Snowflake pipelines (Snowpipe, AWS S3, AWS Glue) for seamless enterprise-level data ingestion, transformation, and advanced business analytics.

• Developed PySpark-based ETL jobs for structured and unstructured datasets, successfully cutting overall data processing time by nearly 30% for critical workloads.

• Created and maintained Kibana dashboards with Elasticsearch pipelines for real-time log analytics, enhancing operational monitoring, alerting, and troubleshooting efficiency.

• Automated large-scale data workflows using AWS Step Functions and integrated them with SageMaker ML models for end-to-end deployment and accurate predictive analytics. Data Analyst Paisabazaar Dec 2018 – May 2021

• Designed, developed, and maintained enterprise-scale data warehouses on Snowflake Cloud, supporting large-scale business analytics and reporting requirements across multiple departments.

• Conducted A/B testing and advanced statistical modeling, significantly improving customer engagement strategies and reducing critical operational errors by 25% across systems.

• Built and optimized Power BI dashboards and interactive Excel reports, providing actionable insights and enabling faster decision-making for multiple business teams.

• Automated data cleaning and transformation pipelines using Python (Pandas, NumPy, SciPy), effectively reducing manual processing effort and enhancing data accuracy.

• Led comprehensive quality assurance and testing procedures, ensuring accurate deployments and reducing post- deployment production issues by nearly 90% overall.

• Collaborated with cross-functional stakeholders to define KPIs, gather requirements, and design ETL workflows, aligning analytics solutions with evolving business needs.

Education

Master of Science in Business Analytics – Kent State University, OH, USA Aug 2021 – Dec 2022 Bachelor of Science in Management – PRIST University, India Jun 2012 – May 2015 Technical Skills

• Programming & Scripting: Python (Pandas, NumPy, Scikit-learn, PySpark, SciPy, Matplotlib, Seaborn), R, SQL (T-SQL, PL/SQL), Java, Scala, C, Shell Scripting, Bash, YAML

• Big Data & Cloud Technologies: Apache Spark, Hadoop, Kafka, Hive, Pig, HBase, Flink, Storm, Cassandra, MongoDB, AWS (S3, EMR, Glue, Lambda, Redshift, Step Functions, SageMaker, Athena, RDS, Kinesis), Snowflake, GCP (BigQuery, DataProc, Pub/Sub), Azure (Data Lake, Synapse, Databricks)

• Databases: MySQL, SQL Server, PostgreSQL, Oracle, Teradata, Amazon Redshift, Cassandra, MongoDB, DynamoDB

• ETL & Data Integration Tools: SSIS, Informatica PowerCenter, Talend, Apache NiFi, AWS Glue, Airflow, dbt (Data Build Tool)

• Data Warehousing & Modeling: Star Schema, Snowflake Schema, Fact & Dimension Tables, Kimball & Inmon Methodologies, OLTP/OLAP, Data Vault Modeling

• Data Visualization & Reporting: Tableau, Power BI, QlikView, Looker Studio, Excel (Pivot Tables, Power Query, Power Pivot), Google Data Studio

• Machine Learning & Statistics: Regression, Classification, Clustering, Random Forest, Gradient Boosting, XGBoost, SVM, Decision Trees, K-Means, Time Series Forecasting, NLP, A/B Testing, Hypothesis Testing, Statistical Modeling

• DevOps & Version Control: Git, GitHub, GitLab, Jenkins, CI/CD Pipelines, Docker, Kubernetes, Terraform

• Workflow Orchestration & Automation: Apache Airflow, Oozie, AWS Step Functions, Luigi, Control-M

• Collaboration & Project Management Tools: Jira, Confluence, Trello, Asana, Slack, MS Teams, ServiceNow, Lucidchart, MS Visio

• Other Core Skills: Data Cleaning, Data Wrangling, Data Governance, Data Quality Frameworks, Metadata Management, Data Security & Compliance (GDPR, CCPA), API Integration, Business Intelligence, Problem Solving, Cross-Functional Collaboration

Contact this candidate