Resume

Data Developer

Location:

Washington, DC

Posted:

November 21, 2017

Contact this candidate

Resume:

Ashok Vardhan Edala Email: ac3fgd@r.postjobfree.com

+1-732-***-****

Summary:

Having 6+ years of experience as a Hadoop/Spark Developer with proven knowledge. Good hands on experience on articulating the requirements, and designing the solutions for requirements and working with various clients and specifications based development.

Am done some Integrated approach development based on organizational values and Growth. Well versed in ETL, BI development with hands on experience on Creating and maintaining warehouse using ETL and Hadoop.

Having 6+ years of experience in Big Data Storage, Application Design, Processing, Development and maintenance of various applications using Java and Scala technologies;

Involved in all the phases of Software Development Life Cycle including Analysis, Design, Development, Integration, and Implementation

Hands on experience with Scala, Python & Java;

Had professional experience with various Big Data Technologies Hadoop, Map Reduce, HDFS, Hive, Pig, Scoop, Kafka and flume;

Experience on working various Versions(0.x,1.x,2.x):

Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (Hive QL).

Experience with building stream-processing systems, using Spark-Streaming;

Experience with Spark Streaming, Spark SQL and Spark MiLB;

Worked on different Hadoop distributions like, Cloudera and Horton works

Worked on Oozie by Writing end to end coordinated workflows;

Developed, deployed and supported several Map Reduce applications in Scala & Java to handle semi and unstructured data;

Hands on experience in data mining process, implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance;

Experience in handling large volume of data in processing;

Familiarity with JIRA & GIT;

Experience with databases, such as MySQL and PostgreSQL;

Worked on Machine learning and Predictive modelling in Python & Scala;

Python Pandas, NumPy, data frames, REST API, stat models, SQL alchemy, NLTK;

Skill Set:

Programming Languages : Python, Scala, Core Java, SQL,cheff,Puppet

Big Data Technologies : Hadoop, Hive, Spark, Kafka, Flume, Pig, Scoop, AWS (EC2, S3,Redshift), Zookeeper, Oozie, Talend, Impala,Yarn,Spark Streaming,SparkSQL,DataFrames

Operating Systems : Windows, RHEL 6.0, Ubuntu, Fedora

Scripting Languages : Shell Scripting, Python, Unix,

Databases : PostgreSQL, MySQL, DB2

NoSQL Databases : HBase, Cassandra, MongoDB

Hadoop Distributions : Cloudera(CDH4/CDH5), Horton Works

Web Service Technologies: SOAP, REST API’s

Web Technologies : HTML, CSS, XML, BOOTSTRAP. Angular JS. jQuery

Education:

Masters in Information Technology (MIT), Virginia International University, Fairfax, U.S.A 2017

Bachelors of Engineering in (C.S.E), NMIT, Bangalore, INDIA 2011

Professional Experience:

INCOMM, Atlanta, Georgia January 2017 to Present

Big Data Developer

Project Name: Internet Voice Recorder

Description: IVR is a DataMart were the call center data records are saved in this DataMart. From the source system, we are getting the files as input. These files will be validated through DQ Frame work and data will be loaded into staging area. By applying transformations staging data, the transformed data will be loaded to Integration layer. Finally, the business logic will be applied on integration layer to load final data to Semantic layer. This layer will be accessed by downstream systems or reporting and further processing.

Tools/Technologies: Apache,S3, Hadoop Zookeeper, Impala, Python, Java (JDK 1.8), Oozie, Tableau, Spark, HDFS.

Responsibilities:

Participated in Design calls and gathering requirements.

Done data migration from data source like Sybase as a data source to HDFS

Developed Pig Latin scripts to extract the data from FTP server to load into HDFS.

Ingested few data by using Scoop, like look up tables and other dimension tables.

Done some ETL analytics using HiveQL by loading data into hive tables

Used Hive as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing the data onto HDFS

Maintaining S3 storage classes and life cycle policy based on the usage

Developed DQ Framework in Scala to validate the input files.

By using Hive created all external tables and Managed tables as per the design.

Converted the Stored Procedures into Spark SQL.

Migrated Hive Jobs into Spark SQL Jobs

Written re-usable components in Spark SQL.

Used Spark SQL extensively for all types of transformations and also to implement the end to end Business Logic.

HSL Technologies, Pune, India August 2011 to March 2016

Hadoop Developer

Project Name: DataGrabber January 2015 to March 2016

Description: Data Grabber is developed for digital marketing domain main aim this project is to targeting the right customers for digital marketing using big data and advanced prediction algorithms. Data grabber is using for email marketing, campaign management and advertisement. Collecting the data from the different sources like Salesforce CRM, social media, flat files, DB, SAP, call centers data etc. and we are applying ETL on these data and creating MDM and 360-degree view of customers. we are applying machine learning algorithms on these data and created likely wood data, clustering the data and recommendation engines. Output of the recommendation engines data is sending to campaign management tool for campaigning the data.

Technologies: Spark, Hadoop, Kafka, Hive, Pig,Java,Sqoop, Python, Django, Cassandra, PostgreSQL, Splunk Pentaho, S3, EC2

Responsibilities:

Design and configured spark and Hadoop cluster in AWS EC2.

Administrating Cluster, fine-tuning, Memory optimization and trouble shooting.

Developed map reduce jobs

Cassandra data model creation and performance tuning

Cassandra, Solar search and Spark integration

Reading streaming data from different social media

Integrated Salesforce CRM and SAP applications

Writing scripts, web scraping, calling APIs and SQL queries

Data standardization using Pentaho kettle

Reporting using Pentaho BI

Near real-time data ingestion from different sources using Kafka

Loading the data into HDFS and HIVE tables

Developing the PIG scripts for processing data

Import and Export data using Sqoop.

Created Oozie workflow

Project Name: Google Analytics & Website Traffic Analysis August 2013 to January 2015

Hadoop Developer

Description: Using Google analytics to track the visitors, Unique visitors, Traffic of website, control ratio, comparing web-based ADHOC request changes, dynamic control handling of the content in Web-app. Calculating visitors, counts, Convert ratio, Flavor based search, regression analysis of the contents using Cloudera Hadoop, Apache Hive, Apache Pig, and Pentaho kettle to process the data and applying the prediction algorithm analysis the trends using BI tools like Tableau and Pentaho Report designer and Pentaho BI Server

Technologies: Hadoop, Talend Openstdio, Apache Hive, MySQL, HBase, Shell, Pig, PostgreSQL, Tableau and Python

Responsibilities:

Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Responsible for building scalable distributed data solutions using Hadoop. Analyzed large amounts of data sets to determine optimal way Hadoop Engineer - Senior to aggregate and report.

Developed Pig Latin scripts to extract the data from Pentaho BI Server to load into HDFS.

Designed the data model for storing the data.

Creating Impala tables, loading with data and writing Impala queries which will run internally in Map Reduce way.

Developed hive External Tables, View to store the so Google Analytics & Website Traffic Analysis

Implemented coded by using Python for retrieving the Google analytics.

Working on hive Join Tables to integrate the data’.

Worked on different file formats like Text files, Sequence Files, ORC files.

Developed Spark using Python Code for faster processing of data.

Implemented Oozie workflow engine to run multiple Python and Hive jobs.

Used Tableau to connect to required data warehouse and perform visualization

Project Name: Field Force Automation August 2011 to July 2013

Domain: FMCG- Retail Sector

Python Developer

Description: This application is used to place new orders and to make changes in the existing ones like changing the orders, Tariff, price and delivery details. Different options are available for placing the order like direct retail store, Indirect store, Internet, telephone etc. Application does provide payment options like EFT or Credit card payment. It provides the fundamental analysis of product movement and helps the retailers come up with aggregated offers for the consumer.

Technologies: Python, Django, CHD-4, Hive, Mongo DB, PostgreSQL, MySQL, Talend, Eclipse

Responsibilities:

Developed a workflow management system and moved all the workflows to the system and led to 30 % utilization improvement.

Built a web portal to manage the jobs launched by the workflow management system

Developed a load balancing system to execute keyword optimization jobs involving millions of keywords per customer. It eliminated the need to acquire new hardware for every new customer.

Performed impact analysis and feasibility study Of CDH4 checksum based out-of-sync detection for keywords and improved efficiency of these detections.

Maintained business critical job-dispatcher that handled more than 60000 jobs a day. Evaluated dispatcher systems in market and proposed a system to replace the in-house job dispatcher.

Optimized the SQL queries that were running for long

Contact this candidate