Sign in

Data Developer

Shawnee, KS
July 31, 2020

Contact this candidate


Pavankumar Yallakara


Professional Summary:

Around 6+ years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Big Data Analytics.

Experience in analysis, design, development and integration using Big Data Hadoop Technology like MapReduce, Hive, Pig, Sqoop, Oozie, Kafka, HBase, AWS, Cloudera, Horton works, Impala, Avro, Data Processing, SQL.

Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.

Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Spark, Scala, Spark-SQL, MapReduce, Pig, Sqoop, Flume, HBase, Zookeeper, Oozie and Tidal.

Hands on experience on developing Pyspark jobs for data cleaning and pre-processing.

Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.

Define the scope of automation, Tools selection, Design, Develop and maintain the Automation framework

Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts and using Hive Query Language.

Experience working with Amazon AWS cloud which includes services like (EC2, EMR, S3A, RDS and EBS), Elastic Beanstalk, Cloud Watch.

Good knowledge in using job scheduling and monitoring tools like Oozie and Zoo Keeper.

Expertise on working with various databases in writing Sql queries, Stored Procedures, functions and Triggers by using PL\SQL and Sql.

Experience in NoSQL Column-Oriented Databases like HBase and its Integration with Hadoop cluster.

Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).

Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.

Good understanding on Spark Streaming with Kafka for real-time processing.

Technical Skills:

Programming Languages



HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, Spark QL, and Zookeeper, PySpark, AWS, Cloudera, Horton works, Kafka, Avro.

Scripting Languages

Python 2.7 & 3.0, Scala and Shell scripting.

RDBMS Languages

Oracle SQL, Microsoft SQL Server, MYSQL.




Pycharm, Eclipse, and Intellij

Operating System

Linux, Windows, UNIX, CentOS.


Agile, Waterfall model.

Other Tools

Attunity Replicate & Compose, Tidal, SVN, Abinitio, Apache Ant, TOAD, Pl/SQL Developer, JIRA, Visual Studio.


Masters – Computer science, University of Colorado Denver, Denver, CO.

Bachelor of Technology – Computer science Engineering, JNTU, India.

Professional Experience:

Waddell & Reed, Overland Park, Ks. Jan’20 – Till date

Sr. Hadoop Developer

Waddell & Reed is a financial institution partnered with one of the Americas largest investments company called IVY Investments, here at Waddell & Reed we deliver the absolute quality of the products to the business client based on their requirement with in the Timeline.


Migrating Python scripts from Pivotal Hadoop to Horton Works Hadoop.

Built Functions and Views in Hive to load data from various sources into the system.

Developed multiple Pyspark jobs for data cleaning and pre-processing.

Built Sqoop jobs for data load from various RDMS.

Hands on experience on Python Scripts to execute functions that load Data from External to Managed Tables in Hawq and Hive.

Trouble shoot Production Issues and Fix the issue.

Analysed the HQL scripts and designed the solution to implement using Python.

Worked in loading and transforming of large sets of structured, semi structured, and Unstructured data.

Managing and reviewing Hadoop Log files to resolve any configuration issues.

Implemented Hive complex UDF's to execute business logic with Hive Queries.

Used TFS for Version control.

Used Tidal (Enterprise scheduler) for Scheduling Daily, weekly and Monthly jobs.

24/7 on call Production Support.

Environment: Pyspark, Spark, Hawq, Hive, Tidal, PL/SQL, Shell scripting and TFS.

Blue Cross Blue Shield, Durham, NC. Dec’17 – Dec’19.

Hadoop Developer

In BCBSNC, Healthy Blue is our health plan for Medicaid members. We work with thousands of doctors, specialists, and hospitals throughout North Carolina, and we partner with many local organizations to help you get the care and services you need to live your best.


Responsible for building scalable distributed data solutions using Hadoop.

Developed Spark jobs and Hive Jobs to summarize and transform data.

Experienced in developing Spark scripts for data analysis in both python and scala.

Built on premise data pipelines using Kafka and spark for real time data analysis.

Analysed the SQL scripts and designed the solution to implement using Scala.

Implemented Hive complex UDF's to execute business logic with Hive Queries.

Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).

Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.

Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.

Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.

Created data frames in particular schema from raw data stored at Amazon S3, lambda using PySpark.

Experienced in loading and transforming of large sets of structured, semi structured, and Unstructured data.

Responsible for developing data pipeline by implementing Kafka producers and consumers.

Worked on the ETL (Ab initio) scripts and fixed the issues at the time of data load from various data sources.

Worked in highly parallelized Ab initio environment to process 1+Tera bytes of data daily.

Developed complicated graphs using various Ab Initio components such as Join, Rollup, Lookup, Partition by Key, Round Robin, Gather, Merge, Scan and Validate

Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.

Developed a program to extract the name entities from ORC files.

Used GIT for version control.

Environment: Cloudera, Hadoop, HDFS, AWS, PIG, Hive, Impala, Ab initio, Spark-SQL, MapReduce, Flume, Sqoop, Oozie, Kafka, Spark, Scala, PySpark, Shell Scripting, HBase, ZooKeeper.

Change Health, Nashville, TN. May’15 – Dec’17

Hadoop/Spark Developer

Change Healthcare is a catalyst for your value-based healthcare system. We are a healthcare technology company that offers software, analytics, network solutions, and technology-enabled services to help create a stronger, more collaborative healthcare system. We help deliver measurable value not only at the point of care, but also before, after, and in between care episodes.


Worked with Hadoop Ecosystem components like Sqoop, Flume, Oozie, Hive and Pig.

Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping the data.

Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs.

Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.

Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.

Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.

Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.

Developed a data pipeline using Kafka, Cassandra and Hive to ingest, transform and analysing customer behavioural data.

Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.

Develop scripts to automate the execution of ETL using shell scripts under Unix environment

Responsible to migrate iterative map reduce programs into Spark transformations using Spark and Scala.

Used Scala to write the code for all the use cases in Spark and Spark SQL.

Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement. Implemented SPARK batch jobs.

Worked with Spark core, Spark Streaming and spark SQL modules of Spark.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.

Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.

Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.

Developed Spark scripts by using Scala shell commands as per the requirement.

Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

Analysed the SQL scripts and designed the solution to implement using Scala.

Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.

Implemented schema extraction for Parquet and Avro file Formats in Hive.

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Worked and learned a great deal from AWS Cloud services like EC2, S3, EMR and RDS.

Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, Oozie, Cloudera, Oracle, Linux.

Hexistech, Hyderabad, Oct’13 – Jul’14

SQL Developer

Hexistech was founded in 2006 by professionals from Bell Labs, with a vision of providing quality and cost-effective IT services. Over these intervening years we worked relentlessly to build a mature and thriving IT services firm, helping clients develop and deploy meaningful IT solutions enhancing their strategic goals.


Plan, design, and implement application database code objects, such as stored procedures and views.

Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.

Provide database coding to support business applications using PL/SQL.

Perform quality assurance and testing of SQL environment.

Develop new processes to facilitate import and normalization, including data file for counterparties.

Work with business stakeholders, application developers, and production teams and across functional units to identify business needs and discuss solution options.

Ensure best practices are applied and integrity of data is maintained through security, documentation, and change management.

Environment: PL/SQL, XML, CSS.

Contact this candidate