Data Engineer Machine Learning

Location:

O'Fallon, MO

Salary:

80,000

Posted:

April 30, 2025

Contact this candidate

Resume:

Aravind P

Data Engineer

O’Fallon, MO +1-314-***-**** ************@*****.*** LinkedIn

SUMMARY

Data Engineer with around 4 years of experience in designing, developing, and optimizing scalable data pipelines and systems to deliver actionable insights and drive efficiency.

Proficient in SDLC, Agile, and Waterfall methodologies, ensuring effective project execution and timely delivery.

Skilled in data processing, statistical analysis, and machine learning with tools like Pandas, SciPy, etc.

Expertise in big data ecosystems, including Apache Spark, PySpark, Hadoop (HDFS, MapReduce), Kafka, Hive, Sqoop, Airflow, and Apache Flink, enabling large-scale data processing and real-time analytics.

Adept at using ETL tools like SSIS, Apache NiFi, Informatica, and Apache Kafka to optimize data extraction, transformation, and loading.

Ability to data visualization with Tableau, Power BI, and Excel, creating clear and insightful dashboards for decision-making.

Experienced with cloud platforms like AWS, Azure, GCP, and DataBricks to build scalable and cost-effective data solutions.

Well-versed in managing both relational (MySQL, PostgreSQL, SQL Server, Oracle) and NoSQL (MongoDB) databases, ensuring efficient data storage, retrieval, and performance optimization. SKILLS

Methodologies: SDLC, Agile, Waterfall

Languages: Python, R, SQL, SAS

Big Data Ecosystem: Apache Spark, PySpark, Hadoop (HDFS, MapReduce), Kafka, Hive, Sqoop, Airflow, Apache Flink ETL Tools: SSIS, Apache NiFi, Apache Kafka, Talend, Apache Airflow, Informatica IDEs: Visual Studio Code, PyCharm, Juypter Notebook Machine Learning Algorithms: LDA, Naive Bayes, Random Forests, Decision Trees, Linear/Logistic Regression, SVM, Clustering, Neural Networks, Principal Component Analysis) Packages: NumPy, Pandas, Matplotlib, Seaborn, ggplot2, SciPy, Scikit Learn Visualization Tools: Tableau, Power BI, Advance Excel Cloud Technologies: AWS, Azure, GCP, DataBricks

DevOps Tools: Docker, Kubernetes, Jenkins, CI/CD

Databases: MySQL, PostgreSQL, SQL Server, Oracle, MongoDB Version Control Tools: Git, GitHub, GitLab

Operating Systems: Windows, Linux, Mac

EXPERIENCE

Data Engineer Kroger, USA Jan 2024 - Present

Designed and optimized data pipelines using PySpark and Hadoop, processing datasets exceeding 10TB, which improved analytics efficiency by 30%.

Conducted advanced data analysis using Python and SQL applying packages like Pandas, NumPy, and SciPy to extract insights and support predictive modeling initiatives.

Developed and deployed ETL workflows with SSIS, Apache NiFi, and Talend, reducing data processing time by 30% and ensuring smooth integration across multiple systems.

Applied machine learning algorithms (LDA, Naive Bayes, Random Forests, Decision Trees, Linear/Logistic Regression, SVM, Clustering, Neural Networks, and PCA) to develop predictive models and extract actionable insights from complex datasets.

Created Tableau dashboards, enhancing decision-making by 20% for cross-functional teams with actionable, real-time insights.

Optimized scalable data pipelines on GCP using Apache Spark, ensuring high-performance data processing and transformation.

Migrated and maintained databases, including PostgreSQL and Oracle, achieving a 40% improvement in query performance through optimization techniques.

Data Engineer Metasystems, India Jan 2020 - Jul 2022

Delivered high-quality data engineering solutions within Agile frameworks, ensuring seamless alignment with business objectives and project timelines.

Designed and optimized ETL pipelines using SSIS and Informatica, integrating complex datasets across multiple systems and reducing processing times by 30%.

Implemented scalable data workflows using Hive, Sqoop, and Apache Airflow, improving data extraction and processing efficiency by 40% for structured and unstructured data.

Designed efficient data models to optimize data processing and storage, ensuring scalability and high performance in analytics workloads.

Developed interactive dashboards in Power BI and Advanced Excel, enhancing decision-making efficiency by 25%.

Deployed and managed data solutions on Azure, leveraging cloud-native services to improve scalability, security, and cost efficiency, reducing operational costs by 25%.

Streamlined deployment processes by containerizing applications with Docker and Kubernetes, minimizing infrastructure downtime by 40%.

Optimized MySQL and SQL Server databases through indexing and partitioning strategies, achieving a 35% improvement in query performance.

EDUCATION

Masters in data Analytics Webster University, Webster Groves, MO May 2024 Bachelors in data science and Analytical Engineering Loyola Academy, Hyderabad, India Jun 2021

Contact this candidate