Dilip Kumar
Data Engineer
Contact Information:
Email: ******************@*****.***
LinkedIn: www.linkedin.com/in/dilip-kumar1703
Phone: +1-475-***-****
Summary:
Highly motivated and detail-oriented Mid-level Data Engineer with a strong foundation and around 3 years of experience in big data technologies, cloud services, and data processing frameworks.
Proficient in Scala, Python, and SQL with hands-on experience in building and optimizing data pipelines. Adept at ensuring data integrity, performing data transformations, and leveraging cloud-based solutions for scalable and efficient data management. Eager to contribute to a dynamic team and support data-driven decision-making processes.
Key strengths include:
Expertise in designing and implementing end-to-end data processing pipelines.
Experience with real-time data ingestion and streaming using Apache Flink, Kafka, and AWS Kinesis.
Proficiency in using AWS services such as S3, Lambda, Glue, Kinesis Data Analytics, Athena, CloudFormation, CloudWatch, MSK, SNS, AWS RDS and Redshift.
Strong SQL skills for querying and managing relational databases like PostgreSQL and MySQL.
Proficient in Scala, Python, and SQL with hands-on experience in building and optimizing data pipelines
Skilled in data transformation, denormalization, and aggregation.
Hands-on experience with CI/CD tools such as TeamCity and Harness.
Effective use of monitoring tools like Datadog and AWS CloudWatch.
Knowledge of version control using Git and automation through CloudFormation.
Capable of developing disaster recovery plans and optimizing cloud costs.
Capable in working with SDLC, Agile and Waterfall Methodologies
Excellent Communication skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies
Strong collaboration skills with cross-functional teams to achieve project goals.
TECHNICAL SKILLS:
Programming Languages
Python, Scala, Java, Shell Scripting.
Databases
PostgreSQL, MySQL, DynamoDB, NoSQL
Web Technologies
HTML, CSS, JavaScript, jQuery, JSON, Bootstrap
Cloud Services
AWS (S3, Lambda, Step Functions, CloudFormation, Glue, RDS, Athena, MSK, CloudWatch, Kinesis, DynamoDB, SNS)
Data Streaming
Apache Kafka, AWS Kinesis Data Analytics
IDE’S
Visual Studio Code, Jupyter Notebook, Intellij IDEA.
Monitoring
Datadog, AWS CloudWatch
Version Control
Git, GitHub
Methodologies
Agile, Scrum
Experience:
Data Engineer
United Airlines, Houston, Texas
Jun 2022 – Jan 2025
Responsibilities:
Designed and implemented end-to-end data processing pipelines using Scala and Apache Flink.
Managed CDC data flow from on-premise systems to Kafka, ensuring seamless data integration into AWS S3 and PostgreSQL.
Utilized AWS Kinesis Analytics to process streaming data and push it to S3 and Postgres.
Ensured data integrity and consistency across multiple data sources.
Created and optimized SQL queries to extract insights and generate reports for stakeholders.
Utilized Athena to query and analyze large datasets stored in S3, enabling data-driven decision-making.
Developed and managed AWS CloudFormation stacks to automate infrastructure provisioning.
Implemented CI/CD pipelines using TeamCity and Harness for automated deployments, reducing manual intervention and errors.
Monitored system performance using Datadog and configured custom metrics and alerts.
Led initiatives to optimize AWS costs by analyzing usage patterns and recommending cost-saving strategies.
Developed and implemented disaster recovery plans using Step Functions, Lambda, CloudFormation templates, and SNS to ensure business continuity.
Created Scala-based data tasks to transform raw data into meaningful models, ensuring they met business requirements and were optimized for performance.
Participated in the architectural design and development of data pipelines, ensuring scalability and efficiency.
Worked closely with cross-functional teams to integrate data across internal and external sources.
Implemented data quality checks and validation processes to ensure compliance with specifications.
Utilized AWS Glue to create ETL jobs for migrating data from various formats (ORC/Parquet/Text) into S3.
Developed Python scripts using the Boto3 library to configure and manage AWS services.
Automated tasks such as launching EC2 instances, managing S3 buckets, and interacting with DynamoDB.
Set up CloudWatch to monitor AWS resources and applications, configuring custom metrics and alarms to track system health and performance.
Managed NoSQL databases such as DynamoDB, ensuring data was stored and retrieved efficiently.
Implemented PySpark scripts within AWS Lambda to perform ETL operations on incoming data streams, reading raw data from sources such as S3 or Kinesis, applying necessary transformations, and writing processed data to Snowflake for further analytics and reporting.
Utilized Git for version control, ensuring code changes were tracked and managed effectively.
Environment: Apache Flink, AWS Glue, S3, Athena, CloudFormation, CloudWatch, SNS, MSK, Kinesis Data Analytics, Lambda, DynamoDB, AWS RDS, PostgreSQL, Apache Kafka, Python, PySpark, Scala, Shell scripting, MySQL, Harness, TeamCity, Snowflake, Git, Datadog.
ACADEMIC PROJECT:
Inventory Management System for Small Retail Store
Developed a desktop-based inventory management system using Python and Visual Studio to help small retail stores manage stock levels, sales, and restocking activities.
Designed and implemented a user-friendly GUI using Tkinter, allowing non-technical users to easily interact with the system.
Built and managed a SQL Server database to handle product information, stock movement, and transactional data.
Utilized SQL queries to perform CRUD operations, generate real-time inventory reports, and trigger low-stock alerts.
Applied Pandas for data cleaning and manipulation to support accurate inventory tracking and reporting.
Created interactive dashboards in Power BI to visualize sales trends, stock status, and historical purchase behavior for better decision-making.
Focused on data accuracy, efficiency, and usability to ensure the system could support real-world small business needs.
Environment: Python, SQL Server, Visual Studio, Tkinter, Pandas, Power BI
EDUCATION:
Sacred Heart University Fairfield, Connecticut JAN 2021 - MAR 2022
Master of Science in Computer Science and Information Technology
Coursework – Data Structures, Structural Program, Java Script, Visual Basic, Data Visualization, Big Data Analytics, Python, Advance Database Design, Java, Information Analysis and System Design, Artificial Intelligence.