Naga Venkateshwarlu Yadav Dokku
818-***-**** ********@*****.***
Career Objective:
Looking for a Spark Developer role and passionate about performance/distributed computing dealing with large datasets and analytics.
Aligning business-driven objectives with the solutions and analytical standards, delivering actionable insights by developing reusable tools and platforms
Contribute towards strong team culture and leadership practices to drive innovations
Experience Summary:
Having around 7 years of working experience on various IT Systems & application using open source technologies involving Analysis, Design, Coding, Testing, Implementation and Training, Excellent skills in state-of-the-art technology of client server computing with good understanding of Big Data Technologies, Machine Learning.
Strong knowledge on implementation of data processing on SPARK CORE using SPARK SQL, MLib and Spark streaming.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce and YARN concepts.
Conversant with SQL/ PL SQL/TL SQL and RDBMS. Contributed in data definitions for new database file/table development and/or changes to existing ones as needed for analysis and mining purpose.
Worked on Big Data Analytics, Hadoop ecosystems (Hadoop, Hive) and Spark, integration with R.
Extensive Knowledge in implementation of machine learning programs in Python, R and SCALA.
Experience using and developing solutions utilizing the Hadoop ecosystem such Hadoop, Spark, MapReduce, Hive, Sqoop, Oozie, Zookeeper, Flume, Kafka, NoSQL databases like Hbase.
Hands-on experience on various Hadoop distributions – CLOUDERA, HORTONWORKS and MAPR.
Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote PIG Scripts to load data into Hadoop Environment.
Experience with working on cloud infrastructure like Amazon Web Services(AWS)
Experience in launching EMR cluster, Redshift cluster, EC2 instances, S3 buckets, Amazon Data Pipeline, Simple Workflow Services instances.
A self-starter, team player, excellent communicator, prolific researcher and organizer with experience in managing and coordinating on-shore and offshore teams.
Education:
DePaul University, Chicago, IL
Master of Science (Predictive Analytics) 12/2016
Indian School of Mines, Dhanbad, India
Integrated Master of Science (Mathematics & Computing) 05/2009
Technical and Analytical Skills:
Roles : Spark Developer, Hadoop Developer, Data Analyst, Project Engineer
Programming: Python, R, C, SQL, Java (Familiarity: Scala, SAS)
Tools: Spyder, IPython Notebook/Jupyter, Spark Notebook, Zeppelin notebook (Familiarity: Git, Docker)
Cloud: AWS/EMR/EC2/S3 (also direct-Hadoop-EC2)
Big Data: Spark, Hadoop, Hive, Pig, Sqoop, (Familiarity: Cloudera Search)
DB Languages: SQL, PL/SQL, Oracle, Hive, Spark SQL
Domain: Big Data, Data Mining, Data Analytics, Machine Learning, Natural Language Processing
Experience History:
Role: Senior Software Engineer Location: Chicago, IL
Client: Axiom Corporation Aug' 15 – Till Date.
At Axiom Corporation am working on Contact Intelligence project to give Axiom Corporation a 360-customer view and pull data from their CRM and ERP systems. This is to help Axiom Corporation get a 360 view on their customers and how they purchase to help Axiom Corporation optimize their marketing department.
Responsibilities:
Involved in building data pipelines that extract, classify, merge and deliver new insights on the data.
Experienced in doing the restful calls using Scala and load the data into spark DataFrames.
Developed Hive/MapReduce/Spark Python modules for ML & predictive analytics in Hadoop/Hive/Hue on AWS.
I Primary designer/developer of a pipeline that ingests/catalogs/stores/analyzes new datasets with final analytics/visualization.
PIG scripts are written for ETL jobs, to acquire data from multiple sources and convert them into uniform format.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
Used various spark transformations like map, reducebykey, filter to clean the input data.
Used spark to populate hive tables and pull the data from blob storage and perform various transformation on the data
Integrated Maven build and designed workflows to automate the build and deploy process
Launching the EMR Cluster and Redshift cluster.
Used spark and spark-sql to read the parquet data and create the tables in hive using the Python API
Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
Environment: Python, Machine learning, AWS, Apache Hadoop, HDFS, Hive, Pig, Apache Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Scala, Git.
Role: BIG DATA /Hadoop Developer Location: Bangalore, India
Client: Vodafone Mpesa Mar’13 - Dec' 14
Description: The functional scope of this project is to provide a web based software application to be used for operational management for Mpesa, where a new user can register and load money. It provides a means to know the status of the customer belongs to various zones to tellers and helps them to assess their customers.
Responsibilities:
Involved in the architecture of the project.
Experienced in decommissioning the legacy systems.
Worked extensively on HIVE, SQOOP, SHELL, PIG, and PYTHON.
Used SQOOP to move the structured data from AS400DB2, Oracle and SQL.
Used AXWAY to FTP the OPTIM files to move to Hadoop and create tables on top of the data.
Scheduled CRON JOB to schedule the shell scripts.
Experienced in writing HIVE JOIN Queries.
Used PIG predefined functions to convert the fixed width file to delimited file.
Used python to read the AVRO file.
Developed shell scripts to perform the incremental loads.
Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
Experienced in moving data from MULTIMEMBER’S to Hadoop.
Launching and Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP
Involved in data migration from one cluster to another cluster.
Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
Used Oozie to schedule the workflows to perform shell action and hive actions.
Experienced in writing the workflows and defining the job properties file for Oozie.
Experienced in managing Hadoop Jobs and logs of all the scripts.
Experienced in Data Validations and gathering the requirements from the business.
Environment: Apache Hadoop, HDFS, Hive, Pig, Spark, HBase, Kafka, Sqoop, Talend, Java, Scala, Git, Shell Scripting.
Role: Hadoop Developer Location: Bangalore, India
Client: Electrolux June' 11 - Mar' 13
Description: In this project, we are mainly to develop platform that will handle big data. Upstream data pushed into HDFS, Transformations applied on the data by using hadoop Tools and the data is processed to downstream for the analysis.
Responsibilities:
Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting tools to leverage the performance of System.
Collected the logs from the physical machines and integrated into HDFS using Flume.
Developed custom MapReduce programs to extract the required data from the logs.
Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
Responsible for loading and transforming large sets of structured, semi structured, and unstructured data.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: JDK 1.5, Hadoop, HDFS, Pig, Hive, MapReduce, HBase, Sqoop, Oozie.
Role: Project Engineer / SQL Developer Location: Bangalore, India
Client: Hewlett-Packard Oct' 09 - May' 11
Description: This project is to provide a solution that will correlate event feeds from multiple systems to determine root cause of failures across network domains, determine end to end service and customer impacts, and provide real time interfaces to customer service management systems.
Responsibilities:
Generated database SQL Scripts and deployed databases including installation and configuration
Developed SQL Scripts to Insert/Update and Delete data in MS SQL database tables
Experience in writing PL/SQL and in developing and implementing Stored Procedures
Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
Build data connection to the database using MS SQL Server
Used different joins, sub queries and nested querying SQL query
Worked with different sources such as Oracle, SQL and Flat files
Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008
Technologies: My SQL, SQL Server 2008(SSRS, SSIS), Visual studio 2000/2005, MS Excel.