Post Job Free
Sign in

Data Engineer Processing

Location:
Irving, TX
Posted:
March 14, 2025

Contact this candidate

Resume:

BHARATH SAI KATARI

Data Engineer

+1-972-***-**** ****************@*****.***

PROFESSIONAL SUMMARY

Highly motivated Data Engineer with over 4 years of experience in designing, optimizing, and managing scalable data pipelines. I have a good expertise in cloud-native solutions, big data processing, and ETL automation, with a strong background in AWS, Azure, GCP, Kubernetes, and Spark. skilled at integrating real-time data streaming systems and transforming raw data into actionable insights. I have a strong passion for decision-making based on data, automation, and analytics powered by AI. Effectively oversaw enterprise data migration projects, enhancing ETL efficiency by 40% and refining data governance procedures. Regularly achieved stringent deadlines while managing significant data engineering initiatives with accuracy and effectiveness.

EDUCATION

Masters Business Analytics East Texas A&M University January 2023 - May 2024 GPA: 4.0

Bachelors Computer Science Vellore Institute of Technology August 2018 - May 2022 GPA: 3.5

SKILLS

Programming/ Scripting Languages & Development: Python, Java, SQL, Scala, Bash, R

Big Data & Cloud Technologies: AWS (Glue, Redshift, EMR, Lambda), Azure (Data Factory, Databricks), Google Cloud (Big Query, Dataflow, Pub/Sub)

Data Processing & ETL: Apache Spark, Hadoop, Kafka, Snowflake, Airflow, Snap Logic, Pentaho

Databases & Storage: MySQL, PostgreSQL, MongoDB, IBM Db2, Oracle, HDFS, S3

Data Modelling & Governance: Data Warehouse Design, Data Lake, Governance, Compliance

DevOps & CI/CD: Jenkins, Git, Docker, Kubernetes, Terraform

Business Intelligence & Visualization: Tableau, Power BI, Seaborn, NumPy, Pandas

Agile & Project Management: JIRA, Scrum, Kanban

PROFESSIONAL EXPERIENCE

AWS DATA ENGINEER

Baylor Scott and White Waco, Texas, USA July 2024 - Present

Created and executed high-performance Spark applications using Python, Data Frames, and Spark SQL for efficient big data processing.

Led end-to-end data pipeline development, integrating AWS Glue, Redshift, and Snowflake for seamless data transformation and analytics.

Migrated an on-premises system to AWS, improving scalability and reducing costs by 25%.

Created real-time data streaming solutions using Apache Kafka and Spark Streaming, enhancing decision-making speed.

Created scalable RDBMS solutions using PostgreSQL, MySQL, and SQL Server to support multi-tenant applications.

Developed and deployed containerized microservices using Docker and Kubernetes, ensuring scalable and modular data processing

Optimized ETL workflows in Alteryx to automate data cleansing, transformation, and reporting, reducing processing time by 40%.

Developed Alteryx Macros and custom workflows to streamline repetitive data processing tasks, improving efficiency across teams.

Automated data ingestion workflows from multiple sources (S3, ORC, Parquet, Text Files) using AWS Glue and Snowflake.

Optimized SQL queries and data extracts for Tableau dashboards, reducing load times by 30%.

Integrated Tableau with cloud databases like Snowflake and AWS Redshift, enabling seamless data visualization.

Enforced data governance best practices, ensuring compliance with industry standards.

Utilized CI/CD with Jenkins and Git, improving deployment efficiency.

DATA ENGINEER

Southern California Edison California, USA June 2023 - June 2024

Architected cloud-based data solutions using AWS services like Glue, Athena, Redshift, and DynamoDB.

Designed RESTful microservices to enable seamless data exchange between distributed systems.

Designed and optimized data pipelines for large-scale ETL operations, reducing processing time by 40%.

Developed automated migration tools using AWS Glue to extract, transform, and load data from Amazon RDS to S3 in JSON format.

Built predictive analytics models in Alteryx Designer, Supporting machine learning algorithms for data-driven decision-making.

Optimized SQL Server databases to support high-volume transactional data processing.

Configured Kafka for real-time data ingestion, ensuring high availability and scalability.

Developed interactive dashboards in Tableau, providing real-time insights into business KPIs and trends.

Improved data consistency and integrity by implementing validation scripts in SQL and Python.

Built and optimized scalable data pipelines using GCP services like BigQuery, Dataflow, and Pub/Sub.

Integrated Snowflake with BI tools like Tableau and Power BI, enabling dynamic analytics and reporting.

AZURE/DATA ENGINEER

Bosch Global Software Technologies Bangalore, India January 2020 - December 2022

Built and optimized Azure-based data pipelines, integrating Data Lake, Data Factory, and Databricks.

Automated ETL workflows using PySpark and SQL, ensuring efficient data processing and data transformations using Python and Bash scripting.

Optimized SQL queries to retrieve and process large datasets efficiently in data warehousing solutions.

Developed and deployed ML models for predictive analytics and anomaly detection using Scikit-learn and TensorFlow.

Designed and implemented scalable data models to support analytics and business intelligence using Snowflake and Redshift.

Executed unit and integration testings for ETL pipelines to ensure data accuracy and reliability.

Trained deep neural networks for image recognition and text classification using TensorFlow and PyTorch.

Deployed JSON scripts for Azure Data Factory pipeline automation, enhancing scalability.

Used Agile methodologies, participating in sprint planning and daily stand-ups.

DATA ANALYST

Dell Technologies Bangalore, India May 2020 - December 2020

Managed large-scale datasets in IBM Db2, Oracle, MySQL, and Snowflake.

Conducted exploratory data analysis (EDA) using Python and Tableau, uncovering business insights.

Automated data validation processes using Excel macros, reducing manual errors by 50%.

Automated database maintenance and cloud infrastructure provisioning using PowerShell scripting.

Developed optimized SQL queries for efficient data retrieval and analytics.

PROJECTS

Predictive Maintenance Using IoT & Big Data

oDeveloped a real-time predictive maintenance system using Kafka, Spark Streaming, and AWS Lambda, reducing system downtime by 35%.

Automated Data Pipeline for E-Commerce Analytics

oDesigned a scalable ETL pipeline on AWS Glue and Snowflake for customer insights, reducing processing time by 50%.

Real-time Stock Market Data Processing

oBuilt a fault-tolerant data pipeline integrating Apache Kafka, Spark Streaming, and AWS Redshift, ensuring low-latency analytics.

CERTIFICATIONS

AWS Certified Solutions Architect – Associate

IBM Machine Learning with Python

Microsoft Certified: Azure Data Engineer

Microsoft Azure DevOps Engineer



Contact this candidate