Data Engineer Machine Learning

Location:

Columbus, OH, 43214

Posted:

January 23, 2025

Contact this candidate

Resume:

JAHNAVI KATLA

DATA ENGINEER

Columbus, OH ************@*****.*** +1-419-***-**** www.linkedin.com/in/jahnavi-katla SUMMARY:

4+ years of experience as a Data Engineer, specializing in building and optimizing data pipelines using Hadoop, Apache Spark, PySpark, Java, and Python across cloud platforms (AWS and Azure).

Expertise in designing and implementing scalable data architectures and ETL frameworks in AWS, including using EMR, S3, Glue, and Redshift, to process and analyze multi-terabyte datasets efficiently.

Proficient in leveraging Azure services (Databricks, Data Factory, Data Lake, Blob Storage) for large-scale data processing and real-time data ingestion, enabling advanced data analytics and machine learning workflows.

Developed and optimized real-time data ingestion systems using AWS Lambda, Kinesis, and S3, reducing latency and enhancing data accessibility for business intelligence.

Extensive experience in Snowflake Data Warehouse and Redshift, creating ELT pipelines, managing data modeling, and optimizing query performance for high-volume data analytics.

Strong skills in orchestrating big data solutions on AWS using Databricks and Spark for processing and analyzing data to support machine learning initiatives and real-time analytics.

Successfully led cloud migration projects, ensuring data quality and integrity during the transition from on- premise systems to AWS and Azure environments.

Developed automated monitoring and alerting systems using AWS CloudWatch and Lambda, ensuring proactive resource management and real-time scaling based on data processing workloads.

Implemented complex data transformations using PySpark and Scala on AWS Databricks, enabling efficient processing of financial and transactional data for strategic insights.

Strong background in data quality management, creating automated Python scripts for data validation and implementing testing frameworks across multiple data environments.

Expertise in SQL, PL/SQL, and database optimization techniques for Oracle, Redshift, and Snowflake, enhancing query performance and improving data retrieval times.

Extensive experience in handling performance tuning of Spark jobs, SQL queries, and ETL processes to optimize data processing pipelines.

Designed and implemented data lakes and data warehousing solutions on cloud platforms to meet compliance, reporting, and business analytics requirements.

Demonstrated leadership in managing cross-functional teams, coordinating efforts between data scientists, IT specialists, and analysts to deliver data-driven solutions.

Experienced in creating technical documentation and conducting knowledge-sharing sessions to upskill teams and ensure best practices in data engineering and cloud migrations.

Strong understanding of data governance and security protocols, particularly in cloud environments, ensuring compliance with industry standards and data protection regulations. TECHNICAL SKILLS:

Category Technologies/Tools

Big Data Ecosystems HDFS and MapReduce, Pig, Hive, Pig Latin, Apache Spark, Apache Kafka, Databricks

Cloud Technologies AWS, AZURE, GCP

Scripting Languages Python, Visual Basic Scripting, Windows PowerShell Programming Languages Scala, Java, J2EE, JDK 1.4/1.5/1.6/1.7/1.8, JDBC, XML Databases

MongoDB, Microsoft SQL Server 2008, 2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Snowflake, Teradata, Cassandra IDEs / Tools Eclipse, Anaconda Navigator, Maven, MS Visual Studio PROFESSIONAL EXPERIENCE:

Data Engineer Feb 2023 - Current

Medtronic, Irving, TX

Architected and implemented scalable data pipelines using AWS EMR, EC2, and Glue to efficiently process multi-terabyte datasets, achieving significant reductions in processing time.

Enhanced financial data analytics by designing and developing data transformations in PySpark on AWS Databricks, improving overall analytical capabilities.

Built a real-time data ingestion pipeline utilizing AWS Lambda and S3, enabling seamless data storage and processing workflows.

Developed automated Python scripts for data quality validation, integrating with AWS services to ensure data integrity and compliance with industry standards.

Designed and implemented AWS-based data lake solutions leveraging PySpark and Scala for data aggregation to meet compliance and reporting needs.

Created data visualization tools using Scala on AWS QuickSight, delivering actionable insights for credit risk management.

Developed an automated resource monitoring system with AWS CloudWatch and Lambda, enabling dynamic scaling of data processing resources based on real-time analytics workloads, optimizing performance and cost.

Executed complex SQL queries on AWS Redshift to drive data analysis and reporting, providing key insights into customer behavior and market trends for strategic decision-making.

Developed a Python-based automated testing framework to validate data integrity and accuracy across various data pipelines, enhancing the reliability of data transformations and loading processes. Data Engineer Mar 2021 - May 2022

Anthem Inc, Bangalore, India

Proficient in working with Azure cloud platforms, including HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, Synapse, SQL DB, SQL DWH, and Data Storage Explorer.

Built an Enterprise Data Lake using Azure Data Factory and Blob Storage, enabling teams to handle complex scenarios and implement machine learning solutions.

Integrated data from MongoDB, MS SQL, and various cloud services (Blob, Azure SQL DB) using Azure Data Factory, SQL API, and Mongo API for streamlined data workflows.

Developed PySpark scripts to mine and transform large datasets, providing real-time insights and reports for business decision-making.

Supported the analytical platform by ensuring data quality and optimizing performance with Python's advanced features like higher-order functions, lambda expressions, and collections.

Applied data cleansing and transformations using Databricks and Spark for large-scale data analysis, and leveraged Azure Synapse to manage workloads, facilitating BI and predictive analysis.

Designed and automated custom-built input adapters using Spark, Sqoop, and Airflow to ingest and analyze data from RDBMS into Azure Data Lake.

Optimized data access by refactoring data models, query optimization, and implementing Redis cache to enhance Snowflake performance.

Developed automated workflows for daily incremental data loads, migrating data from RDBMS to Azure Data Lake.

Monitored Spark clusters using Azure Log Analytics and Ambari Web UI, transitioning log storage from MS SQL to CosmosDB to improve query performance.

Created automated ETL jobs using Talend to push data into the Snowflake data warehouse, managing resources and scheduling with Azure Kubernetes Service.

Utilized Azure DevOps for continuous integration and deployment (CI/CD), debugging, and monitoring jobs and applications, while ensuring security with Azure Active Directory and Ranger.

Collaborated with data science teams for preprocessing and feature engineering to support machine learning algorithms in production environments.

Fine-tuned Spark NLP applications by optimizing parameters such as batch interval times, parallelism, and memory settings to enhance processing efficiency.

Provided data for interactive Power BI dashboards, enabling advanced reporting and data-driven insights. AWS Data Engineer Mar 2020 - Feb 2021

ADP Technologies, Hyderabad, India

Designed and implemented a multi-tier data architecture on AWS, utilizing S3, Redshift, and RDS to support high-volume data analytics, enabling scalable data processing.

Developed ETL frameworks in Python, integrated with AWS Lambda to automate data ingestion and processing from multiple sources, improving data pipeline efficiency.

Built advanced analytics models on AWS EMR using Spark to provide deep insights into transportation patterns and customer behavior.

Implemented AWS Databricks for data aggregation, enhancing data quality and preparation for machine learning applications.

Designed real-time data ingestion systems using AWS Kinesis and Lambda, optimizing data flows for immediate analysis and faster decision-making.

Developed a real-time recommendation engine using Scala and AWS technologies to optimize ride-sharing matches and improve customer experiences.

Created a data reconciliation framework with AWS Glue and Python, ensuring data accuracy and consistency across storage platforms, reducing reporting discrepancies and improving analytics reliability.

Automated ETL processes with Python scripts, reducing extraction, transformation, and loading times by 30%, while ensuring consistent data quality and reliability.

Utilized SQL for complex querying and database management tasks, optimizing performance and improving the efficiency of cloud-based and on-premises data analysis and reporting.8 m] EDUCATION:

Master’s in Health Informatics

University of Findlay, Ohio

Bachelor of Computer Science

Osmania University, India

CERTIFICATIONS:

Microsoft Certified Azure Data Engineer Associate

AWS Certified Data Engineer Associate

Certified Big Data Hadoop and Spark Developer

Contact this candidate