Data Engineer Big

Location:

Columbus, OH

Salary:

115000

Posted:

November 14, 2024

Contact this candidate

Resume:

Vivek Pabbathi

LinkedIn: LinkedIn

475-***-****

************@*****.***

Objective: A driven data engineer with in-depth knowledge of big data technologies and programming languages looking for a position at a firm focused on growth where I can use my abilities to the company's advantage while also having the opportunity to further my own education.

SUMMARY

• Having 4+ years of Professional experience with specialization on Big Data, cloud- based platforms and related technology experience in storage, querying, processing, analysis of data.

• Excellent understanding of Hadoop architecture such as HDFS, YARN, Hive, Sqoop, Oozie and MapReduce programming paradigm.

• Experience setting up AWS Data Platform - AWS CloudFormation, Development End Points, Amazon Kinesis, AWS Glue, EMR and Jupyter/ Sage maker Notebooks, Redshift, S3, and EC2 instances.

• Experience in implementing scalable data solutions on Microsoft Azure platform, utilizing services like Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Active Directory Security Groups and Azure resources.

• Designed and developed ETL/ELT processes to extract data from various sources, transform it into the desired format, and load it into the target system.

• Very good knowledge of SQL, PL/SQL, T-SQL especially writing queries with Joins, Sub-Queries, Rollups, Cubes, Set Operators.

• Designed and created data extracts, supporting Power BI, Interactions, DAX, Tableau and other visualization tools reporting applications.

• Having excellent Communication skills, Strong Problem-Solving skills, ability to troubleshoot complex data issues, collaborated closely with cross-functional teams to understand business requirements and provide data-driven insights.

EXPERIENCE

NTT DATA GROUP, Dallas, TX — DATA ENGINEER

February 2023 - Present

Cogzent Infotech PVT LTD, HYD, INDIA — DATA ENGINEER

May 2020 – December 2022

SKILLS

Big Data Technologies:

Hadoop, HDFS, Spark, Hive,Sqoop

Cloud Platform: AWS, Azure.

Programming Languages: Python, Java, C, Scala, Pyspark

Scripting Languages: HTML, JavaScript, Unix Shell Scripting

Database: MySQL, PostgreSQL, MongoDB, Cassandra, Oracle

Data Visualization Tool:Tableau, Power BI

Version Control Tools:Git, Git Hub

EDUCATION

Trine University, Angola, IN — master’s degree, Business Analytics.

GPA: 3.8

Sri Indu College of Engineering and Technology, Hyderabad, TS— Bachelor of Technology - BTech, Civil Engineering

GPA: 3.5

PROJECTS

NTT DATA GROUP, Dallas, TX — DATA ENGINEER Cloud Data Migration & Analytics

Utilized various AWS services for cloud-based data processing and storage, analyzed data, prepared datasets and performed ETL process and migrated data into Snowflake.

Utilized PySpark, Python and Spark SQL for ETL across on-premises and AWS cloud platform like S3 and Redshift leveraging NoSQL database MongoDB

Utilized Informatica for serverless ETL jobs, simplifying the data preparation process, utilized AWS EMR for scalable big data processing and proficient in Apache Flink, a stream processing framework for real-time data analytics

Conducted comprehensive data analysis and profiling to identify data quality issues, data anomalies leveraging SQL queries. Developed workflows and scheduled, automated jobs using Apache Airflow and Nifi.

Implemented DevOps practices, including version control with Git, containerization with Docker, and infrastructure as code with Terraform.

Implemented RESTful APIs to enable seamless communication and integration between different software systems.

Automated testing and continuous integration practices within the Agile workflow to maintain data quality and expedite delivery

Implemented Snowflake including data modeling, semantic model definition, and star schema construction enhancing scalability and performance in data analytics and created reports using QlikView and Power BI.

Cogzent Infotech PVT LTD, HYD, INDIA — DATA ENGINEER Azure Data Engineering & Analytics

Utilized Azure services such as Azure SQL Database and Azure Synapse Analytics for data warehousing and analytics solutions and expertise in RDF (Resource Description Framework) for representing and structuring data.

Designed and implemented end-to-end ETL and ELT solutions to extract, transform, and load data from various sources into Azure Data Lake Storage using Azure Data Factory.

Utilized Databricks for serverless ETL jobs, simplifying the data preparation process, and enhancing scalability and performance in data analytics.

Leveraged Python, MySQL, and Kafka to develop data processing scripts and optimize query performance for large-scale datasets.

Implemented DevOps practices, including version control with Git, containerization with Docker, and infrastructure as code with Terraform.

Utilized PowerShell scripting for troubleshooting and diagnostic tasks, including system log analysis, network connectivity testing, and performance tuning.

Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from NoSQL database CosmosDB into HDFS vice-versa using Sqoop and Airflow for workflow orchestration.

Worked with Kubernetes for container orchestration and Jenkins for continuous integration and deployment (CI/CD) pipelines.

Developed and designed interactive and visually appealing reports and dashboards using Power BI and Tableau to present data-driven insights to business stakeholders.

Contact this candidate