Resume

Data Engineer Senior

Location:

Frisco, TX

Posted:

November 08, 2023

Contact this candidate

Resume:

vasanthi venkatesh

Plano, Texas, United States

ad0yb2@r.postjobfree.com 945-***-****

linkedin.com/in/vasanthi-venkatesh-3335091a1

Summary

Experienced Data Engineer with 5 years in the field, specializing in the design, development, and maintenance of data infrastructure and pipelines. Proficient in data collection, transformation, storage, and processing, I have a strong track record of enabling data-driven decision-making for organizations by ensuring the availability and reliability of high-quality data. Skilled in a range of data engineering technologies and tools, I have a proven ability to collaborate with cross-functional teams and implement robust data solutions that support business objectives. Experience

Senior Data Engineer

Mastercard

2022 - Present (1 year)

• Experience working with Hadoop ecosystem, including HDFS, MapReduce, Hive, Pig, and Spark.

• Strong understanding of distributed computing concepts and parallel processing frameworks.

• Proficient in setting up and configuring Hadoop clusters, monitoring cluster performance, and optimizing data processing workflows.

• Hands-on experience with writing MapReduce jobs, Hive queries, Pig scripts, and Spark applications to process and analyze large datasets.

• Familiarity with data ingestion and integration techniques using tools like Sqoop and Flume.

• Developed ELT processes from the files from abinitio, google sheets in GCP, with compute being data prep, data proc (pyspark), and Bigquery.

• Developed workflow in Oozie to automate loading the data into HDFS and pre-processing, analyzing, and training the classifier using MapReduce jobs, Pig jobs, and Hive jobs.

• Worked on python scripts to import data from sources like MS SQL Server, SQL Lite, and Oracle DB.

• Worked with GCP cloud using GCP Cloud storage, Data-Proc, Data Flow, Big- Query, G - Cloud function, Google Cloud Composer, Cloud dataflow, Pub/Sub cloud shell, GSUTIL, BQ Command line utilities, Data Proc, Stack driver.

• Involved in building up Datalake, loading and transforming the large sets of structured and semi structured data and created Data Pipelines as per the business requirements and scheduled it using Airflow and Oozie Coordinators.

• Assisted in the design and development of data processing workflows using Apache NiFi, enabling efficient data ingestion from various sources into the data lake.

• Contributed to the migration of on-premises Hadoop infrastructure to cloud-based solutions on AWS, reducing infrastructure costs by 25%.

Senior Data Engineering

CVS Health

Dec 2020 - Dec 2021 (1 year 1 month)

• Responsible for the design and implementation of a Business Intelligence solution for the entire company, including a data warehouse, automated reports and user interface for ad hoc reports and analytics.

vasanthi venkatesh - page 1

• Hands-on experience with ETL process using DB2, including data extraction and transformation.

• Worked on Hadoop services such as HDFS, YARN, Pig, Hive, Hbase, Kafka, MapReduce, Sqoop, Oozie, Zookeeper, NIFI, Airflow, and involved in analyzing log data to predict the errors by using Apache Spark.

• Involved in various phases of development, analyzed, and developed the system through Agile and Scrum.

• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python, and Scala.

• Involved in scheduling the Oozie workflow engine to run multiple Hive and Pig jobs.

• Involved in writing UNIX shell scripts and automating the ETL processes using UNIX shell scripting.

• Implemented Spark using Scala and Data frames and Spark SQL API, Data Frames, and Pair RDDs for faster processing of data and created RDDs, Data Frames, and datasets.

• Developed Nifi workflow to pick up the data from the rest API server, the data lake, and the SFTP server and send that to the Kafka broker.

• Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

• Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.

• As a part of MLOps working on the migration of various pyspark models from on-perm Hadoop clusters to GCP using Dataproc and Oozie workflow.

• Worked with Spark to improve performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDDs.

• Effective communication and presentation skills, with the ability to convey complex concepts to both technical and non-technical audiences.

Data Engineering

Tech Mahindra

May 2017 - 2019 (2 years)

• Experience working with Hadoop ecosystem, including HDFS, MapReduce, Hive, Pig, and Spark.

• Strong understanding of distributed computing concepts and parallel processing frameworks.

• Proficient in setting up and configuring Hadoop clusters, monitoring cluster performance, and optimizing data processing workflows.

• Hands-on experience with writing MapReduce jobs, Hive queries, Pig scripts, and Spark applications to process and analyze large datasets.

• Familiarity with data ingestion and integration techniques using tools like Sqoop and Flume.

• Experience working with various cloud platforms, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

• Experience in project management, overseeing and delivering projects on time and within budget.

• Strong understanding of project management methodologies, such as Agile, Waterfall, and Scrum.

• Proficient in project planning, scheduling, and resource management.

• Extensive experience in leading cross-functional project teams, coordinating efforts, and driving collaboration.

• Experience in data analysis, business intelligence, and decision making.

• Strong analytical and problem-solving skills, with the ability to collect, analyze, and interpret large and complex datasets.

• Proficient in using statistical analysis tools and techniques to identify patterns, trends, and correlations.

• Experience in data visualization and reporting to effectively communicate insights to stakeholders.

• Expertise in making data-driven decisions by translating analytical findings into actionable recommendations.

vasanthi venkatesh - page 2

• Strong understanding of decision-making frameworks and methodologies.

• Excellent attention to detail, ensuring accuracy and quality in data analysis and decision-making processes.

• Effective communication and presentation skills, with the ability to convey complex concepts to both technical and non-technical audiences.

Education

University of North Texas

Master's degree, Information Technology

2020 - 2021

Related Coursework: Website development, Data Modelling, Project Management, Data Analysis, and Knowledge Discovery, Cyber Security, Data visualization, System Analysis, and Design, Information Organization and Access, Information Access and Knowledge Enquire. Jawaharlal Nehru Technological University

Bachelor's degree, Computer Science

Prominent courses: C, Java, Data Structures, Database Management Systems, Mathematics, Operating Systems, PHP, CSS, HTML.

Lead two academic projects and met expectations.

Licenses & Certifications

Unix Essential Training - LinkedIn

AWS Certified Cloud Practitioner - Amazon Web Services (AWS) Issued Jul 2023 - Expires Jul 2025

AWS Solutions Architect Associate - Amazon Web Services (AWS) Issued Oct 2023 - Expires Oct 2026

Skills

Oracle Database • Statistical Data Analysis • Analytical Skills • Customer Relationship Management

(CRM) • Communication • Data Science • Big Data Analytics • DevOps • C (Programming Language) • SQL Azure

vasanthi venkatesh - page 3

Contact this candidate