Dinky Mishra
Ohio, USA ***********@*****.*** +1-440-***-**** linkedin.com/in/mishradinky GitHub SUMMARY
Data Engineer with 3+years of experience in building secure, scalable data pipelines on cloud platforms using Spark, PySpark, Java, and Python. Strong proficiency in data modeling, ETL orchestration & CI/CD best practices. Applied domain-driven design to translate business requirements into robust data solutions in client/finance contexts. Lifelong learner responsibly leveraging AI to optimize workflows and enhance data delivery. TECHNICAL SKILLS
Programming & Query Languages: Python, PySpark, Scala, SQL, HiveQL, SparkSQL, Java, Shell scripting, JavaScript, R, GraphQL Cloud & Big Data: AWS (Glue, EMR, S3, Redshift, Lambda, Kinesis, IAM, CloudWatch, Athena, SNS, SQS), Azure (Synapse Analytics, ADLS Gen2, Data Factory, Functions), GCP (DataProc), Snowflake, Databricks, Hadoop, Hive, YARN ETL & Orchestration: Apache Airflow, AWS Glue workflows, Azure Data Factory, AWS Step Functions Data Modeling & Warehousing: Legend data modeling, star/snowflake schemas, partitioning, clustering, schema design (Redshift, Synapse, Snowflake), SQL query tuning, ER diagrams, data cataloging CI/CD & DevOps: Jenkins, Git/GitHub, Docker, Kubernetes, Terraform, CloudFormation, Azure DevOps, Agile/Scrum/Kanban Machine Learning & AI: Model development and deployment, MLOps, GenAI, LLMOps, PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers, OpenAI API
Visualization & Reporting: Tableau, QlikSense, SSRS Other: Linux, MongoDB, Elasticsearch, Node.js, API development WORK EXPERIENCE
Data Engineer, LTIMindtree Aug 2021- Aug 2023
Project: Connected Data Platform, Navistar Inc., USA (AWS, Azure, Snowflake, PySpark, Synapse, Data Factory, Python, Scala, SQL)
• Designed and implemented end-to-end data solutions, including logical and physical data models, ER diagrams, and data flow diagrams to support business needs.
• Developed and optimized SQL queries and Snowflake pipelines for large-scale data warehousing, ensuring accuracy and accessibility of data.
• Built and automated CI/CD pipelines using Azure DevOps and GitHub, enabling rapid and reliable deployment of data and ML models.
• Implemented security, backup & recovery procedures for cloud data platforms, achieving 100% compliance & zero unauthorized access incidents.
• Enhanced pipeline throughput by 30% and reduced workflow failures by 75% through automation and robust testing.
• Collaborated with cross-functional teams to ensure data solutions were consistent, scalable, and met business requirements.
• Documented data architectures and transformations throughout the data lifecycle, supporting data cataloging and governance.
• Mentored junior engineers on best practices in data engineering and cloud technologies. Project: Fin-Tech, LTIMindtree, India (GCP, Azure, AWS, Airflow, Python, Java, SQL, MongoDB)
• Cut pipeline runtime by 40% by refactoring Java microservices into PySpark on Azure Synapse.
• Automated CI/CD for DataStage, Informatica, and Spark jobs, increasing deployment frequency and reducing errors.
• Improved data quality by 30% by integrating real-time streaming checks with Kafka and Kinesis and batch cleansing with PySpark.
• Enabled self-service analytics by developing 20+ dashboards in Tableau and QlikSense connected to Redshift and Hive.
• Accelerated schema releases by 20% by modeling star/snowflake schemas in Snowflake and documenting metadata catalogs.
• Orchestrated timely financial data deliveries using Agile methodologies and Apache Airflow. Graduate Assistant, Cleveland State University (Data Engineer & Teaching Assistant) April 2024 - May 2025
• Automated grading analytics for 150+ students, enabling targeted academic support.
• Mentored 40+ students in programming languages and data engineering concepts.
• Co-created teaching resources and proto-typed a multi-agent website framework. PROJECTS
• Banking ETL & BI Platform [AWS S3, Glue, EMR, Redshift, Databricks, PySpark, Airflow] Built a cloud-native ETL pipeline to ingest, transform, and visualize transaction data for compliance dashboards.
• Bitcoin Price Prediction Using ML & LLM [Google Colab, Python, scikit-learn, Hugging Face Transformers, Fin-BERT, OpenAI API] Forecasted daily Bitcoin prices by combining time-series regression models with transformer-based sentiment signals from social media.
• Big Data-Driven Search & Retrieval System [Node.js, Puppeteer, MongoDB, MySQL, Elasticsearch, TF-IDF] Engineered a scalable web crawler and NLP-powered search engine to index and rank financial literature with relevance scoring.
• Object Recognition System using AI [Jupyter Notebook, TensorFlow, PyTorch, OpenCV, Docker] Developed an end-to-end computer vision pipeline for detecting and classifying objects in live video streams. EDUCATION
Cleveland State University Cleveland, USA
Master's in Computer Science (M.S.) 3.91/4.0 Aug 2023 – May 2025 University of Mumbai Mumbai, India
Bachelor's in Computer Engineering (B.E) 3.80/4.0 July 2016 – July 2021 CERTIFICATIONS/ EXTRA CURRICULAR ACTIVITIES / VOLUNTEER
• Databricks certified Machine Learning Associate
• Databricks certified Data Engineer Associate
• Databricks Certified Generative AI Fundamentals
• Microsoft Certified Azure Fundamentals
• Google Certified Professional Data Engineer Certification by CloudGuru
• Director at Association for Computing Machinery for Women at Cleveland State University