Post Job Free

Resume

Sign in

Data Analyst Engineer

Location:
United States
Posted:
March 27, 2024

Contact this candidate

Resume:

Priya Reddy

ad4l8f@r.postjobfree.com

360-***-****

Overall 6+ years of experience as Big Data Engineer, Data Analyst and Python developer, comprises designing, development, and implementation of data models for enterprise-level applications.

Hands-on experience on major components in Hadoop Ecosystem including Spark (PySpark) Hive, HBase- Hive Integration, PIG, Sqoop, Flume, MapReduce framework and HDFS.

Experience in using various AWS Services such as EMR, S3, cloud watch, Lambda, GLUE to run and monitor Hadoop and spark jobs on AWS Environment.

Responsible for transforming and loading the large sets of structured, semi-structured and unstructured data.

Expertise in developing multiple Kafka Producers and Consumers as per the requirements.

Proficient in developing Spark applications using Scala and Python to transform the data by implementing data frames, data sets and RDD’s.

Performed Data Profiling and Data Analysis using SQL on different extracts.

Used Repl for developing Spark scripts in Spark shell.

Experience in developing a pipeline using Kafka to store data into HDFS.

Experience in creating databases, users, tables, triggers, macros, views, stored procedures, functions, packages, and hash indexes in Teradata database.

Experience in Snowflake Integration. Stage, external tables, views.

Build and maintain data pipelines to process daily events into our Spark and Snowflake data warehouse.

Managed deployment automation using Terraform to automate system operations.

Experience in provisioning AWS resources using Terraform.

Good experience in using Apache NiFi for automation of the data movement between various Hadoop systems.

Experienced in designing different time driven and data driven automated workflows using Oozie.

Strong knowledge in working with ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.

Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.

Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage.

Experience in importing and exporting the data using Sqoop from HDFS to Relational Database Systems and from Relational Database Systems to HDFS.

Extensive experience in development of Bash scripting and PL/SQL scripts.

Proficient in relational databases like Oracle, MySQL and SQL Server.

Knowledge in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, STS.

Capable in working with SDLC, Agile and Waterfall Methodologies.

Excellent Communication skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies.

Perform data profiling; identify/communicate data quality issues and work with other teams as needed to resolve them.

Worked on Oracle table components to Teradata Table Components in Abinitio Graphs.

Set up standards and processes for Hadoop based application design and implementation of various Hadoop based applications.

Extensive experience on various version controls like Git, SVN.

Extremely organized with demonstrated skills to perform several tasks and assignments simultaneously within the scheduled time.

Ability to work with managers and executives to understand the business objectives and deliver as per the business needs and a firm believer in teamwork.

Excellent Problem solving, programming, active thinking and communication skills.

Experience working both independently and collaboratively to solve problems.

Big Data Technologies

Hadoop (HDFS, MapReduce), Spark, PySpark, Hive, Kafka-Storm, Pig, Sqoop, Oozie, Cassandra.

Bigdata distribution

Cloudera, Hortonworks, Amazon EMR

Programing Languages

Python, Scala, Unix Shell scripting, Spark SQL, HiveQL, C.

Databases

MySQL, MS-SQL Server, NoSQL (HBase, MongoDB)

Java Technologies

JSP, Servlets, JDBC, Junit.

Web Technologies

HTML, XML, JavaScript, jQuery, CSS, Python.

Operating Systems

Windows, Linux (Ubuntu)

Web Services/Provisioning Tools

AWS, Terraform.

Reporting/ETL Tools

Tableau, QlikView, Informatica, Pentaho.

Servers

Apache Tomcat, WebSphere, JBoss

IDE's

Eclipse, IntelliJ IDEA, NetBeans.

EDUCATION DETAILS:

Master of Computer Science : Southeast Missouri State University, Cape Girardeau, MO.

Bachelor of Technology : Jawaharlal Nehru Technology University, Hyderabad, India.

WORK EXPERIENCE:

Client: Premier Inc, Charlotte, NC July 2021– Till Date

Role: Big Data Engineer

Responsibilities:

Developed data transition programs from S3 to Snowflake using AWS Lambda by creating functions for certain events based on use cases.

Worked on developing POC project based on the requirement and designed a detailed documentation for the process implemented.

Completely owned process development for reading files from s3 and displayed data in Snowflake table and view by creating Integration and stage in Snowflake.

End-to-end process development design for data transfer from S3 to Snowflake.

Experienced in creating external tables and managing views in snowflake.

Worked with multiple data bases data and move data to snowflake using Qlik replication tool.

Worked on CI/CD pipelines, using Git, Jenkins, Docker to setup and configure big data architecture on AWS cloud platform.

Worked on AWS Lambda to run servers without managing them and to trigger run code by S3 and SNS.

Created secret manager and stored snowflake user credentials.

Used Snowflake connector package to maintain the confidentiality of the credentials to login.

Experience in creating multiple resources in AWS Cloud in multiple environments using Terraform.

Worked on creating resources like Lambda, S3, IAM role, policy, secret manager, SNS, Glue, etc.,

Worked with CICD using Jenkins to deploy infrastructure.

Environment: Spark, Python, Snowflake, Terraform, Docker, Tableau, Qlik, AWS, S3, Lambda, Secret Manager, CloudWatch, SQL, Gitlab.

Client:Edward Jones, MO. Oct 2020-June 2021

Role: BigData Engineer

Project Description:

Edward Jones is a large investment firm, leading financial services industry serving 8 million clients and strongly believes in long term relationship with client by maintaining quality investments. Investment services include stocks, mutual funds, insurance. With respective to data analysis, company mainly focus and works on previous data policy and claims to manage Risk factors.

Responsibilities:

Participate with technical staff team and business managers and practitioners in the business unit to determine requirements and functionalities needed in a project.

Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for the certain events based on use cases.

Implemented the AWS cloud computing platform by using RDS, Python, Dynamo DB, S3 and Redshift.

Moved data from the traditional databases like MS SQL Server, MySQL, and Oracle into the Hadoop by using Sqoop.

Worked on developing workflow in Oozie to automate the tasks of loading data into HDFS and preprocessing with Pig.

Worked on creating Oozie workflow and the coordinator jobs to remove the jobs in time for availability of data.

Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.

Involved in the creation of Hive tables, loading, and analyzing the data by using hive queries.

Have worked on creating and configuration of EC2 instances on AWS (Amazon Web Services) for the establishment of clusters on the cloud.

Experience in creating resources using Terraform.

Worked on CI/CD solution, using Git, Jenkins, Docker to setup and configure big data architecture on AWS cloud platform.

Worked on AWS Lambda to run servers without managing them and to trigger run code by S3 and SNS.

Working on integrating Kafka Publisher in spark job to capture errors from Spark Application and push into database.

Involving in processing log files generated from various sources to HDFS for further processing through Elastic Search, Kafka, Flume & Talend and process the files using Piggybank.

Writing custom Kafka consumer code and modified existing producer code in Python to push data to Spark-streaming jobs.

Exposed to all aspects of software development life cycle (SDLC) like Analysis, Planning, Developing, Testing, implementing and post-production analysis of the projects. Worked through Waterfall, Scrum/Agile Methodologies.

Configuring a MongoDB cluster with High Availability, Load balancing and performing CRUD operations.

Loading data from flat files into the target database using ETL processes by applying business logic for inserting and updating records when loaded.

Performed day-to-day Database Maintenance tasks including Database Monitoring, Backups, Space, and Resource Utilization.

Used Tableau Data Visualization tool for reports, integrated tableau with Alteryx for Data & Analytics.

Building Nifi data pipelines in docker container environment in development phase.

Creating Docker images to support models in different formats and developed using different languages.

Used Maven in building and deploying code in Yarn cluster.

Used JIRA as the Scrum Tool for Scrum Task board and work on user stories.

Experience in building scripts by using Maven and performing continuous integrations systems like Jenkins.

Environment: Hadoop, Spark, Hive, HDFS, Kafka, UNIX, Shell, AWS Services, Python, Scala, GLUE, Oozie, SQL, AWS SageMaker, Nifi, Docker, Kafka, Kubernetes, Tableau,

Client:InfoWorks,CA Sep 2019 – Sep 2020

Role: Hadoop Developer

Project Description:

Automates operations based on the big data for creating data pipelines from source to consumption.Big data solutions are onpremise and in the cloud in automated workflows for Business Intelligence and Analytics.Main aim of the project working in organization is to gather data from different sources and perform ETL methodologies and deliver data as per the reqirements.

Responsibilities

●Develop a data set process for data modeling and recommend the ways to improve data quality, efficiency, and reliability.

●Extracted, Transformed, and Loaded (ETL) and Data Cleansing of data from various sources like XML files, Flat files, and Databases and Involved in UAT, Batch testing, test plans.

●Responsible for writing Hive Queries to analyze the data in Hive warehouse using Hive Query Language (HQL). Involved in developing Hive DDLs to create, drop, and alter tables.

●Extracted the data and updated it into HDFS using Sqoop Import from various sources like Oracle, Teradata, SQL server etc.

●Created Hive staging tables and external tables and joined the tables as required.

●Implemented Dynamic Partitioning, Static Partitioning and Bucketing.

●Installed and configured Hadoop Map Reduce, Hive, HDFS, Pig, Sqoop, Flume and Oozie on Hadoop cluster.

●Implemented Sqoop jobs for data ingestion from the Oracle to Hive.

●Managed Azure Infrastructure Azure Web Roles, Worker Roles, VM Role, Azure SQL, Azure Storage, Azure AD Licenses, Virtual Machine Backup and Recover from a Recovery Services Vault using Azure PowerShell and Azure Portal.

●Working Experience on Azure Databricks cloud to organizing the data into notebooks and making it easy to visualize data using dashboards.

●Worked in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

●Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

●Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster.

●Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.

●Utilized Azure HDInsight to monitor and manage the Hadoop Cluster.

Automated Sqoop incremental imports by using Sqoop jobs and automated jobs using Oozie.

●Worked on various compression and file formats like Avro, Parquet, and Text formats.

●Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.

●Responsible for creating complex dynamic partition tables using Hive for best performance and faster querying.

●Involved in developing Hive User Defined Functions, compiling them into jars, and adding them to the HDFS, and executing them with Hive Queries.

●Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats.

●Developed custom the Unix/BASH SHELL scripts for the purpose of pre- and post-validations of the master and slave nodes, before and after the configuration of the name node and data nodes, respectively.

●Developed job workflows in Oozie for automating the tasks of loading the data into HDFS.

●Implemented compact and efficient file storage of big data by using various file formats like Avro, Parquet, JSON and using compression methods like GZip, Snappy on top of the files.

●Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame and Pair RDD's.

●Worked on Spark using Python as well as Scala and Spark SQL for faster testing and processing of data.

●Extensively used Stash, Bit-Bucket and GITHUB for the code control purpose.

●Migrated Map reduce jobs to Spark jobs for achieving a better performance.

●Wrote test cases, analyzed and reporting test results to product teams.

Environment: Hadoop, HDFS, Hive, Sqoop, Apache Spark, Spark-SQL, ETL, Maven, Oozie, Scala, Python3, Unix shell scripting.

Client: Innintech, India Jan 2018 – July 2019

Role: Hadoop Developer

Project Description:

Service based company, which is more innovative in offshore development firm in Hyderabad. Mainly work in developing and designing web based applications and mobile based applications. Focus highly in selecting tools to maintain customer satisfaction and achieve significant results.

Responsibilities:

Involved in data extraction that includes analyzing, reviewing, modeling based on requirements using higher-level tools such as Hive and Spark.

Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.

Experience in writing scripts in Unix for the automation of the sanitization process.

Experience using Sqoop from HDFS to RDBMS & vice versa in importing and exporting data.

Experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.

Extensive practical experience in incremental import by creating Sqoop meta store jobs.

Experience in using Apache Flume for collecting, aggregation, moving large amount of data from application server and handling variety of data using streaming and velocity of data.

Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like XML files and Databases.

Involved in writing complex Hive queries to load and process data in Hadoop File System and performance tuning.

Implement Spark applications using Scala to perform advanced procedures like text analytics and processing, utilizing data frames and Spark SQL API with in-memory computing capabilities of Spark for faster processing of data.

Designed the process to ingest data from Cloudera cluster Edge Node to Hive for further processing of the data.

Developed scripts to use Secure File Transfer Protocol (SFTP) to send files securely from one server to another.

Worked with Control – M for scheduling and automating Batch Processing.

Validating against the Target tables loaded into the cluster and reporting in case of any discrepancies.

Raising defects in case of any found in the process.

Preparing use cases for big data work in technical workshops.

Organized daily SCRUM meeting with team, prioritize product backlog items and responsible for timely delivery and deployment of product releases.

Environment: MySQL, Spark, Scala, Hadoop, Hive, HDFS, Control- M, Flume, Sqoop, SQL Server, Unix Scripts.

Client: HDFC Bank, HYD Apr 2016 - Dec 2017

Role: Python Developer

Project Description:

Is a leading private sector bank in India which offers wide range of banking products and financial services. It also provides wide variety of channels that are deliverable in different areas like investment banking, life insurance and asset management.

Responsibilities:

Development Life Cycle (SDLC) and used agile methodology for developing application.

Designed the front end of the application using Python 2.7, HTML, CSS, JSON and jQuery. Worked on backend of the application.

Involved in analysis and design of the application features.

Created UI using JavaScript and HTML/CSS.

Writing backend programming in Python.

Used JavaScript and XML to update a portion of a webpage.

Worked on changes to open stack accommodate large-scale data center deployment.

Worked in MySQL database on simple queries and writing Stored Procedures for normalization.

Responsible for handling the integration of database system.

Developed and Deployed SOAP based Web Services on Tomcat Server.

Used object-relational mapping (ORM) solution, technique of mapping data representation from MVC model with an SQL-based scheme.

Used IDE tool to develop the application and JIRA for bug and issue tracking.

Used GIT to coordinate team development.

Environment: Python 2.7, MySQL, HTML, CSS, JavaScript, jQuery, Sublime text, JIRA, GIT.

Client: Karvy Group, Hyderabad, India. July 2014 - Mar 2016

Role: Python Developer

Project Description:

Integrated financial services group, which is holding financial and non-financial parts of business in delivering the quality and security for customers while accessing the site. It also include transaction processing services and research.

Responsibilities:

Designing User interfaces using HTML and CSS.

Designed and developed application using Python programming language.

Involved in designing and testing of the application.

Worked in many code changes and enhancements as per the requirements.

Implemented and optimized, complex SQL queries for the generation of reports.

Managed Code Repository via GIT.

Developed testcases to verify business logic.

Environment: Python, HTML, CSS, GIT, MySQL.



Contact this candidate