SANGEETHA MALLI RAMESH BAPU
972-***-****, Allen, TX, USA
*******@*****.***
Summary
* ***** ** ** ********** in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies
3+ years of hands on experience with Big Data Ecosystems including Hadoop (1.0 and YARN),Tableau, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, MongoDB, Zookeeper, Kafka, Maven, Spark, Scala, HBase, Cassandra(CQL)
Experience in installation, configuration and deployment of Big Data solutions
Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming
Experience in analyzing the data using Hive UDF and Hive UDTF custom Map Reduce programs in Java
Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL
Worked on Agile/SCRUM software development .
Responsible for deploying the scripts into Github version control repository hosting service and deployed the code using Jenkins.
Using Jenkins AWS CodeDeploy plugin to deploy to AWS and Migrated applications to the AWS cloud.
Good experience in AWS services, Networking, Storage, and Cloud Technology.
Proficient in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. Enabled the Passive audit check after ingesting data into external hive tables by matching the count between the source file and hive table count.
Primarily responsible for designing, implementing, Testing, and maintaining database solution for Azure.
Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
Hands on experience with Real time streaming using Kafka, Spark streaming into HDFS
Implemented pre-defined operators in Spark such as map, reduce, sample, filter, count, cogroup, groupBy, sort, reduce By Key, take, group By Key, union, left Outer Join, right Outer Join, and etc.
Developed analytical components using SparkSql and Spark Stream.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala
Deeply involved in writing complex Spark-Scala scripts, written udf's, Spark context, Cassandra sql context, used multiple API's, methods which support data frames, RDD's, data frame Joins, Cassandra table joins and finally write/save the data frames/RDD's to Cassandra database.
Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC
Technical Skills:
Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Sparksql, Spark streaming, AWS, Azure Datalake
NoSQL Databases: Hbase, Cassandra, MongoDB
Build Management Tools Maven, Apache Ant
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting
Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit
Version control Github, Jenkins
Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x
Education:
Bachelor of Technology in Electronics & Instrumentation Engineering from Jawaharlal Nehru Technological University, Hyderabad, Andhra Pradesh in 2006
Certifications - From Big Data University
•Hadoop Fundamentals
•Accessing Hadoop Data Using Hive
•Introduction to Pig
Professional Experience:
HCSC-BCBSTX / BcForward, Dallas, TX Feb 2017 to Dec 2017
Big Data / Hadoop Developer
Responsibilities
Created a Zeke event in FTP Process to trigger on end of mainframe JCL job for Stoploss project in which the Zeke event triggers the datalake zena ingestion process.
Involved in creating Java Script to enable the Process variable for trigger to consumption and enabled the date timestamp partition.
Responsible in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. The Passive audit check is enabled after ingesting data into external hive tables by matching the count between the source file and hive tablecount.
Developed custom aggregate functions using Spark SQL and performed interactive querying.
Ingested the contract, commission, CVS claims historical files one time load into the incoming raw layer in HDFS file system and scheduled the incremental data in Zena scheduler by date timestamp partition.
Involved in adding the data to the new partition in hive external staging table to read data from partition and loaded the external Hive ORC tables with Snappy compression using Pig HCatalog scripts.
Applied several business rules as per the requirement in the data transformations and made data available to the downstream consumption teams.
Worked on Walgreens Member search project with tight time lines and configured the ingestion process by applying the business requirements in data transformations by eliminating header data from control files and exported the processed data from HDFS smith outgoing layer to ADW.
Real time data processing (Kafka, Spark Streaming & Spark Structured Streaming ), Worked on Spark SQL, Structured Streaming, MLib and using Core Spark API to explore Spark features to build data pipelines using SCALA, Implemented Spark streaming applications & fine tune to reduce shuffling.
Worked on Jira Scrum software development for issue tracking and release management. Responsible for the moving the ingestion scripts into Github version control repository hosting service and deployed the script using Jenkins.
Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, HBase, Zena Scheduler, Jira, Github, Jenkins, Azure, Kafka, Spark Streaming, Spark SQL
VectorSoft, Dallas, TX Jan 2016 to July 2016
Big Data / Hadoop Developer
The project is based on collecting, analyzing the data from various sources and create a data lake by data cleansing and data enrichment. The Transformed data lake serves data for the customer analytics data mart, which powers interactive analysis and reporting needs.
Responsibilities
Configured Flume and Kafka to capture the data from various sources such as Clickstream data and twitter feeds
Involved in data ingestion from relational databases into HDFS using Sqoop
Data cleansing and data enrichment is done using Pig Latin and HiveQL
Build exception files for all non compliant data using Pig
Responsible for managing data from various sources
Created Hive External table for Semantic data and loaded the data into tables and query data using HQL
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Worked with different data sources like Avro data files, XML files, Json files, SQL server and Oracle to load data into Hive tables
Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files
Business Metrics are build as part of target platform using HiveQL
Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
Using Jenkins AWS CodeDeploy plugin to deploy to AWS and Migrated applications to the AWS cloud.
Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, Kafka, HBase, noSQL, Oracle 10g, PL/SQL, SQL Server, Windows NT, Tableau, AWS
POCs
Spark Streaming:
Created a port for live streaming and data is taken by streaming context
Used Maven as a deployment tool for Spark submit and generated a jar file with a sliding window interval of 5 secs
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
The generated output is stored, used for creating Spark Data Frames for further analysis
Environment: Spark streaming, Scala, Maven,
Web log Streaming using Spark:
Used Flume agent for stimulating NASA log files as source and sink as Sparksink
Generated live streaming with a sliding window interval of 10 secs
The Custom Scala function is added to the source program for multiple operations
The generated output is transformed to Spark Data Frames/ RDD's and connected to Cassandra database
Environment: Flume, Spark, Scala, Maven, Cassandra
Idea Info Solutions, Bangalore Sep 2013 to May 2015
Hadoop developer
Responsibilities
The prime objective is to get the customer insights from the sources across the globe and heterogeneous applications in order to perform analytics from the structured, unstructured data in efficient and reliable manner.
•Worked on analyzing Hadoop stack and different big data tools including Pig and Hive, Hbase database and Sqoop
•Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop
•Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS
•Designed and Develop user defined functions to provide custom HIVE and PIG capabilities cross the application teams
•Created Hive External tables and loaded the data into tables and query data using HQL
•Collected the logs data from web servers and integrated into HDFS using Flume
•Worked on Impala for exposing data for further analysis and for generating transforming files from different analytical formats to text files
•Implemented test scripts to support test driven development and continuous integration
•Worked on tuning the performance of HIVE and PIG queries
•Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
•Worked on Agile/SCRUM software development .
Environment: HDFS, Java, MapReduce, Pig, Hive, Impala, Hbase, Oozie, Sqoop, Flume, Linux.
PCS technology, Bangalore Mar 2009 to Oct 2011
Java Developer
Responsibilities
Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming Worked on creating EJBs that implemented business logic
Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript
Involved in designing and development of the ecommerce site using JSP, Servlet, EJBs, JavaScript and JDBC
Used Eclipse 6.0 as IDE for application development
Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer
Configured Struts framework to implement MVC design patterns
Designed and developed GUI using JSP, HTML, DHTML and CSS
Worked with JMS for messaging interface
Environment: Java, J2EE, HTML, DHTML, JSP, Servlets, XML, EJB, Sturts, GIT,Weblogic 8.1, SQL Server 2008R2, CentOS, UNIX, Linux, Windows 7/Vista/XP