Data Engineer Machine Learning

Location:

Hyderabad, Telangana, India

Posted:

October 21, 2024

Contact this candidate

Resume:

SUMMARY

SREEJA BOGGULA

PH: 1+845-***-****

Email: ****************@*****.***

Data Engineer with 5 years of experience in cloud platforms (AWS, Azure, GCP), SQL, Python, and big data technologies. Expertise in ETL processes, machine learning, and data visualization. Skilled in designing and implementing data pipelines, optimizing data workflows, and creating insightful visualizations. Proficient in leveraging cloud resources to develop scalable and efficient data solutions.

PROFESSIONAL EXPERIENCE

Data Engineer Agiliti Dallas, TX Sep 2023 – Till Date

Leveraged Azure Data Factory, Data Lake, HDInsight, Synapse Analytics, Cosmos DB, and Databricks for comprehensive data analysis and management within a Big Data framework.

Worked with the analysis teams and management teams and supported them based on their requirements

Migrated SQL databases to Azure platforms using DDL statements and configured Linked Servers for data transfer between SQL servers.

Developed Spark applications in Databricks for data extraction, transformation, and aggregation, enhancing ETL processes.

Utilized Python, PySpark and Linux Shell scripting for data loading, transformation, and integration efforts, ensuring seamless workflow.

Automated cluster creation in Azure HDInsight’s with PowerShell scripts, streamlining deployment processes.

Designed, developed, and maintained scalable data pipelines using AWS services such as EMR, Glue, and Redshift to support real-time analytics.

Implemented data lake solutions on AWS S3, integrating data from multiple sources for big data analytics and machine learning models.

Orchestrated batch and real-time data processing using Apache Spark on AWS EMR, improving data processing speeds by 40%.

Developed server less data processing solutions using AWS Lambda and Kinesis, enabling near real-time data streaming and analytics.

Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.

Designed and deployed production-grade data solutions on platforms like Snowflake Data Warehouse and Azure Data Lake, enhancing data analytics capabilities.

Data Engineer Accenture Ltd Hyderabad, India Oct 2020- July 2022

Utilized GCP with Python for Big Data Analytics and Machine Learning, implementing Spark ML for enhanced insights.

Designed SQL database structure with Django Framework using agile methodology, ensuring scalability and flexibility.

Built dataflow pipelines to migrate Hedis medical data from multiple sources to target data platforms efficiently.

Developed complex SSIS packages using SQL Server and Teradata, optimizing data integration processes.

Utilized AWS Athena for querying large datasets directly from S3, reducing the need for complex data warehousing solutions.

Implemented CI/CD pipelines for data engineering projects using AWS Code Pipeline and Cloud Formation, ensuring seamless deployment and version control. Configured and optimized AWS Redshift clusters, ensuring cost-effective and high-performance query execution for complex datasets.

Involved in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP, RDBMS Cubes, Star Schema and Informatica Snowflake Schema.

Managed Microsoft SQL Servers and implemented maintenance jobs for 10 instances, ensuring data reliability and performance.

Used Python, PySpark to extract weekly information from XML files.

Implemented real time data injection using Apache Kafka.

Developed Spark SQL to load tables into HDFS to run select queries on top.

Utilized GIT for version control, JIRA for project management, and Jenkins for continuous integration and Deployment (CI/CD) processes.

Experience writing Shell scripts in Linux OS and integrating them with other solutions.

Implemented ETL processes from various sources like Kafka, NIFI, Teradata, and DB2 using Hadoop Spark, enhancing data processing capabilities.

Data Analyst Silicon Labs Hyderabad, India Jan 2019 – Oct 2020

Worked as a Data Modeller/Analyst, generating data models using Erwin and deploying them to Enterprise Data Warehouses

Performed data accuracy, quality checks, and data analysis before and after data loading.

Collaborated with business analysts and architects to translate business requirements into logical and physical data models for Data Warehouses, Data Marts, and OLAP applications using E/R Studio.

Conducted JAD sessions for data modeling and established data-related standards, utilizing forward engineering to create tables, views, SQL scripts, and mapping documents.

Written PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.

Involved in SQL Server and T-SQL in constructing Tables, Normalization and De-normalization techniques on database Tables.

Designed a Star Schema for the detailed data marts and Plan data marts involving confirmed Dimensions.

Produced 3NF data models for OLTP designs using data modelling best practices and modelling skills.

Designed and developed Use Cases, Activity Diagrams, and Sequence Diagrams using Unified Modelling Language

Created Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.

Developed and maintained sales reporting using in MS Excel queries, SQL in Teradata, and MS Access.

SKILLS

PYTHON

SQL

Java

Amazon Web Services (AWS)

Google Cloud Platform (GCP)

Snowflake

Azure

Kafka

Spark

Flink

Tableau Process

Postgres

Hadoop

Flume

Hive

Apache Nifi

MapReduce

Metastore

Presto

ClickHouse

Big Data

GitHub

DynamoDB

Git

System Architecture

Business Intelligence

Informatica

Talend

Amazon Redshift

Teradata

Great Expectations

Docker

Kubernetes

Jenkins

Tableau

ETL Development

API Integration

Data Mining

Data Pipeline

Data warehousing

Data Architecture

Data Visualization

Data Streaming

Data Analysis

Data Processing

Data Quality

Statistical Analysis

Data Modelling

Data Ingestion

Data Integration

ETL Process

PySpark

NoSql

EDUCATION

Masters in Information Systems-Business Analytics

Marist College

Bachelors in Electronics and Communications Engineering

Sreyas Institute of Engineering and Technology

CERTIFICATIONS

AWS Certified Data Engineer - Associate

Databricks Lakehouse Fundamentals

Contact this candidate