Data Engineer Modeling

Location:

Avon Lake, OH

Posted:

June 18, 2025

Contact this candidate

Resume:

Dinky Mishra

Ohio, USA ***********@*****.*** +1-440-***-**** linkedin.com/in/mishradinky GitHub SUMMARY

Data Engineer with 3+years of experience in building secure, scalable data pipelines on cloud platforms using Spark, PySpark, Java, and Python. Strong proficiency in data modeling, ETL orchestration & CI/CD best practices. Applied domain-driven design to translate business requirements into robust data solutions in client/finance contexts. Lifelong learner responsibly leveraging AI to optimize workflows and enhance data delivery. TECHNICAL SKILLS

Programming & Query Languages: Python, PySpark, Scala, SQL, HiveQL, SparkSQL, Java, Shell scripting, JavaScript, R, GraphQL Cloud & Big Data: AWS (Glue, EMR, S3, Redshift, Lambda, Kinesis, IAM, CloudWatch, Athena, SNS, SQS), Azure (Synapse Analytics, ADLS Gen2, Data Factory, Functions), GCP (DataProc), Snowflake, Databricks, Hadoop, Hive, YARN ETL & Orchestration: Apache Airflow, AWS Glue workflows, Azure Data Factory, AWS Step Functions Data Modeling & Warehousing: Legend data modeling, star/snowflake schemas, partitioning, clustering, schema design (Redshift, Synapse, Snowflake), SQL query tuning, ER diagrams, data cataloging CI/CD & DevOps: Jenkins, Git/GitHub, Docker, Kubernetes, Terraform, CloudFormation, Azure DevOps, Agile/Scrum/Kanban Machine Learning & AI: Model development and deployment, MLOps, GenAI, LLMOps, PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers, OpenAI API

Visualization & Reporting: Tableau, QlikSense, SSRS Other: Linux, MongoDB, Elasticsearch, Node.js, API development WORK EXPERIENCE

Data Engineer, LTIMindtree Aug 2021- Aug 2023

Project: Connected Data Platform, Navistar Inc., USA (AWS, Azure, Snowflake, PySpark, Synapse, Data Factory, Python, Scala, SQL)

• Designed and implemented end-to-end data solutions, including logical and physical data models, ER diagrams, and data flow diagrams to support business needs.

• Developed and optimized SQL queries and Snowflake pipelines for large-scale data warehousing, ensuring accuracy and accessibility of data.

• Built and automated CI/CD pipelines using Azure DevOps and GitHub, enabling rapid and reliable deployment of data and ML models.

• Implemented security, backup & recovery procedures for cloud data platforms, achieving 100% compliance & zero unauthorized access incidents.

• Enhanced pipeline throughput by 30% and reduced workflow failures by 75% through automation and robust testing.

• Collaborated with cross-functional teams to ensure data solutions were consistent, scalable, and met business requirements.

• Documented data architectures and transformations throughout the data lifecycle, supporting data cataloging and governance.

• Mentored junior engineers on best practices in data engineering and cloud technologies. Project: Fin-Tech, LTIMindtree, India (GCP, Azure, AWS, Airflow, Python, Java, SQL, MongoDB)

• Cut pipeline runtime by 40% by refactoring Java microservices into PySpark on Azure Synapse.

• Automated CI/CD for DataStage, Informatica, and Spark jobs, increasing deployment frequency and reducing errors.

• Improved data quality by 30% by integrating real-time streaming checks with Kafka and Kinesis and batch cleansing with PySpark.

• Enabled self-service analytics by developing 20+ dashboards in Tableau and QlikSense connected to Redshift and Hive.

• Accelerated schema releases by 20% by modeling star/snowflake schemas in Snowflake and documenting metadata catalogs.

• Orchestrated timely financial data deliveries using Agile methodologies and Apache Airflow. Graduate Assistant, Cleveland State University (Data Engineer & Teaching Assistant) April 2024 - May 2025

• Automated grading analytics for 150+ students, enabling targeted academic support.

• Mentored 40+ students in programming languages and data engineering concepts.

• Co-created teaching resources and proto-typed a multi-agent website framework. PROJECTS

• Banking ETL & BI Platform [AWS S3, Glue, EMR, Redshift, Databricks, PySpark, Airflow] Built a cloud-native ETL pipeline to ingest, transform, and visualize transaction data for compliance dashboards.

• Bitcoin Price Prediction Using ML & LLM [Google Colab, Python, scikit-learn, Hugging Face Transformers, Fin-BERT, OpenAI API] Forecasted daily Bitcoin prices by combining time-series regression models with transformer-based sentiment signals from social media.

• Big Data-Driven Search & Retrieval System [Node.js, Puppeteer, MongoDB, MySQL, Elasticsearch, TF-IDF] Engineered a scalable web crawler and NLP-powered search engine to index and rank financial literature with relevance scoring.

• Object Recognition System using AI [Jupyter Notebook, TensorFlow, PyTorch, OpenCV, Docker] Developed an end-to-end computer vision pipeline for detecting and classifying objects in live video streams. EDUCATION

Cleveland State University Cleveland, USA

Master's in Computer Science (M.S.) 3.91/4.0 Aug 2023 – May 2025 University of Mumbai Mumbai, India

Bachelor's in Computer Engineering (B.E) 3.80/4.0 July 2016 – July 2021 CERTIFICATIONS/ EXTRA CURRICULAR ACTIVITIES / VOLUNTEER

• Databricks certified Machine Learning Associate

• Databricks certified Data Engineer Associate

• Databricks Certified Generative AI Fundamentals

• Microsoft Certified Azure Fundamentals

• Google Certified Professional Data Engineer Certification by CloudGuru

• Director at Association for Computing Machinery for Women at Cleveland State University

Contact this candidate