Data Engineer Machine Learning

Location:

Houston, TX

Salary:

80000

Posted:

November 13, 2024

Contact this candidate

Resume:

Adithya Bandi Data Engineer

Email: *******@*******.*** Contact: +1-405-***-**** Location: Dallas, TX, USA. LinkedIn Summary

Dynamic Data Engineer with 4 years of extensive experience in planning and implementing efficient data pipelines and robust storage solutions across different environments and sources. Specialized expertise in database technologies, including MySQL

& PostgreSQL. Proven ability to architect scalable data processing workflows using frameworks like Apache Spark and Django, significantly improving operational efficiency. Experienced in data encryption, access control, and data masking to maintain data security and integrity. Proficient in integrating cloud services within Microsoft Azure and AWS to enhance business intelligence initiatives. Committed to data governance and designing interactive dashboards using Power BI and Tableau to deliver actionable insights. Well-versed in data modelling, data architecture & employing DevOps practices, including containerization with Docker and orchestration using Apache Airflow, to optimize data workflows and ensure seamless integration throughout systems.

Core Skills

Programming Languages: Python, SQL, Scala.

Database: MySQL, NoSQL, Cassandra, MS SQL Server, Azure Cosmos DB.

Data Engineering & Frameworks: Django, Flask, Apache (Hadoop, Spark).

Big Data Tools: Apache HBase, Apache Cassandra, Apache Kafka, Azure Event Hub, Apache Sqoop.

Machine Learning: PyTorch, Pandas, NumPy, Scikit-Learn, TensorFlow, Keras, AWS SageMaker, LLM (Large Language Models).

Data Visualisation: Power BI, Tableau, MS Visio.

Project Management Tools: Agile Methodologies, Scrum, Kanban, Jira, Asana.

ETL Tools: Apache NiFi, Talend, Alteryx.

Cloud Services: Amazon Web Services ((AWS), Glue, Redshift, S3, Kinesis, Bedrock), Microsoft Azure (Data Factory, Data Lake, Synapse, Databricks), Google Cloud Platform (GCP).

Other Competencies & Tools: Data encryption, Access Control, Data Masking, Compliance regulations (GDPR, HIPAA), Data Modelling, Data Architecture, Data Governance, DevOps Practice, Containerisation (Docker), Orchestration (Apache Airflow). Experience

McKinsey & Co., USA. Data Engineer. Jan 2024 – Current

Wrought Apache Hadoop and Sqoop to process over 10TB of data, while using Python and Scikit-Learn to automate data preparation, cutting processing time by 20%.

Drafted and performed data storage solutions with MySQL, and Azure Cosmos DB for 4TB of data, while developing 20+ pipelines using Apache NiFi, Talend, and Alteryx, boosting data processing speed.

Enlarged data models & architecture for 15TB+ datasets, ensuring data quality & consistency, while applying governance policies to enhance security, access control & GDPR compliance, reducing breaches.

Executed machine learning models with TensorFlow & AWS SageMaker & planned Python scripts using Pandas & Scikit-Learn, improving data processing efficiency & model accuracy across datasets huge records.

Deployed Docker for containerization & Apache Airflow for orchestration to optimize data processing for over 5,000 transactions monthly, while implementing DevOps practices that reduced deployment time.

Applied Agile methodologies, including Scrum and Kanban, to manage data engineering projects, using Jira and Asana for tracking and collaboration, which improved project delivery times across multiple projects. Avenir, India. Data Engineer. Sep 2019 – Jul 2022

Employed Apache Hadoop and Spark to process large datasets, achieving a 40% reduction in processing time, while developing Python scripts with Pandas and NumPy to prepare over 50,000 records for analysis.

Planned & maintained data storage solutions with MySQL & MS SQL Server to manage records, while developing data pipelines with Talend and Alteryx to ingest data from 10+ sources into both relational and NoSQL databases.

Implemented AWS solutions like Glue and S3 to process over 10 TB of data, utilizing Docker for containerization and Apache Airflow for orchestration, achieving a reduction in processing time across 10+ data pipelines.

Confirmed DevOps practices for continuous integration and delivery of 20+ data pipelines and applications, while using Python, Pandas, and NumPy to analyze datasets of over many records, improving processing efficiency.

Applied machine learning models with TensorFlow and AWS SageMaker to analyze data trends while managing 10 projects using Agile and Kanban, reducing delivery times by 30%.

Utilized Jira to track progress and collaborate on 10+ projects while ensuring compliance with GDPR through necessary controls, leading to an improvement in project delivery timelines.

Participated in code reviews to ensure scalable, efficient data solutions that meet business requirements, collaborating with data scientists and analysts to develop high-performance data pipelines, improving processing efficiency. Education: Master of Science, Lindsey Wilson College, KY, USA. May 2024 Certification: Azure Data Engineer Associate (DP-203), Azure Power BI Associate (PL-300).

Contact this candidate