MANOJ KRISHNA REDDY K : ******@*******.***
: +1-704-***-****(Mobile)
PROFESSIONAL SUMMARY:
Having 10+ years of experience in the Information Technology industry as a Data warehouse professional with extensive experience in Data Lake, Data Warehousing, Data Integration, Data Acquisition, Data Ingestion, Data Migration, Data Modeling, Data Profiling, Data Analysis, Data Cleansing, Data Quality, Data Processing, Data Mart and Data Governance projects including Implementation, Maintenance, Testing, and Production Support of Applications.
About 3 years of experience in architecting, developing and implementing Big-Data technologies in core and enterprise software development initiatives and applications that perform large scale Distributed Data Processing for Big data analytics and Big Data ecosystem tools; Hadoop, Hive, Pig, Sqoop, HBase, Spark, Spark SQL, Spark Streaming, Python, Kafka, Oozie, Zoo Keeper, Yarn, TEZ.
Hands on experience in using various Hadoop distributions (Cloudera, Horton works, MapR).
In depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
Experience using Sqoop to import/export data into HDFS from RDBMS and vice-versa and Loading the Real-time data into HDFS using Kafka.
Expertise in the Concepts of COST Based Optimization technics and Identified various solutions to improve the Performance of HIVE SQL’s.
Experience in implementing Spark Sql and converting business process into RDD transformations Spark.
Strong skills in Informatica 7.1, 8.6.1, 9.5.1, IBM-DataStage 9.1/8.5/7.5, SQL Programming, Teradata, IBM DB2, PL/SQL, SQL Server, Performance tuning and Shell Scripting.
Expertise in ELT process well versed concept in Teradata and DB2.
Created the High-level design documents and Source to Target Mapping for ETL/ELT process.
Performance tuned for different subject areas by implementing aggregates, aggregate join indexes, compression, statistics, and SQL re-writes; along with foundation table modifications including Index changes.
Working Knowledge in Teradata load utilities including FastLoad, Multiload and BTEQ in network attached client environment.
Worked extensively in UNIX client/server environment and have good exposure in shell scripting.
Used Git, Jenkins and Sonarqube for Continuous Integration and Development of the code.
Worked in both Waterfall and Agile methodologies.
Excellent analytical, problem-solving, communication and interpersonal skills.
Self-motivated, energetic team player with demonstrated proficiency for learning new tools and business environment.
TECHNICAL SKILLS:
Big Data Hadoop Ecosystem, HDFS, Map Reduce, Pig, HIVE, Sqoop,
Oozie, HBase, and Zookeeper, YARN, Spark, Kafka, Python
ETL Informatica 7.1, 8.1.1, 8.6.1, Informatica Metadata Manager 8.6.1
DataStage 9.1/8.5/7.5, SSIS.
NoSQL Database HBase, Druid
IDE/Build Tools Eclipse, Maven, IntelliJ, TFS
Continuous Integration Jenkins, Git, Sonarqube
Version Control Git, Team Foundation Server
OLAP Cognos ReportNet 1.1, Tableau 10.X
Database Oracle9i, 10g; Teradata 13.10, 14.00, SQL Server 2012
Languages SQL, PL/SQL, Shell Scripting, Python
Scheduling Tool Autosys, Control-M
Agile JIRA, Version One
CERTIFICATION:
Teradata 12 Certified Professional
EDUCATION:
Bachelor of Technology in Electronics and Communications Engineering from J.N.T. University.
PROFESSIONAL EXPERIENCE:
Excel Global Solutions Milwaukee, WI
June 2017 – Till date
Hadoop Developer
NCB’s primarily aims to build its Enterprise Information Management & Business Intelligence/Analytics delivery platform by consolidating information from various internal transactional and data stores as well as external sources to make the information and Analytical capabilities driven from this information accessible to various users/departments (retail banking domain, payments, merchants, analysts with credit card etc.)
•Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
•Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
•Worked on multiple sources to bring the data to Data Lake and build the snapshots of data on daily basis and load the data into HDFS.
•Worked extensively with Sqoop for importing data from Oracle, Dynamic CRM Web Services and SQL Server to HDFS.
•Developed Spark code using Python and Spark-SQL/Streaming for faster testing and processing of data.
•Developed Spark scripts by using Python shell commands as per the requirement.
•Developed Python scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
•Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
•Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
•Developed Hive (hql) scripts, HDFS external/managed tables, Oozie workflow and coordinator application to load the data into HDFS-Landing, foundation layer.
•Experienced in implementing Spark RDD transformations, actions to implement business analysis.
•Migrated Hive QL queries on structured into Spark QL to improve performance
•Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
•Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
•Used Reporting tools like Tableau to connect to Drill and generate daily reports of data.
•Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
•Extensively worked on creating End-End data pipeline orchestration using Oozie.
Environment: Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, Yarn, Shell scripting, Spark, Python, Tableau.
Target Corporation (RETAIL) Minneapolis, MN
Apr 2016 - June 2017
Lead BI Engineer
The Guest Data Foundation project is about building a Guest Data Lake that integrates the MDM and operational systems data to support Guest data needs like Cartwheel, RedPERKs, etc. and create vehicles for teams to cross-share how guest data can be consumed and consumed.
Responsibilities:
•Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
•Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
•Import the data from different sources like HDFS/HBase into Spark RDD.
•Developed Spark code using Python and Spark-SQL/Streaming for faster testing and processing of data.
•Experienced with batch processing of data sources using Apache Spark.
•Experienced in implementing Spark RDD transformations, actions to implement business analysis.
•Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
•Migrated Hive QL queries on structured into Spark SQL to improve performance.
•Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
•Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
•Created Pig scripts that reads and writes into HBase tables.
•Created Oozie workflows that runs the hive, pig, shell scripts and performs quality checks.
•Created External Hive Table on top of parsed data and save data using ORC file formats.
•Worked with different File Formats like TEXTFILE, JSON, XML, PARQUET and ORC file formats for HIVE querying and processing.
•Performed advanced procedures like server log analytics using the in-memory computing capabilities of Spark using python.
•Implemented Real time streaming the data using Spark with Kafka and Spark SQL for faster processing.
•Provided the design to patterns for joins, updates and other missing features of Hive.
•Working knowledge in IDEs like Eclipse.
•Working knowledge of using GIT, Maven for project dependency / build / deployment.
•Build the Consumers to have real time data stream using Kafka and Spark streaming.
•POC Developed Data Integration and data pipeline using Kafka and Spark to store data into HDFS.
•Automated all the jobs using Oozie workflows and to run multiple Map Reduce and Pig jobs and supported in running jobs on the cluster.
Environment: Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, HBase, Shell scripting, Apache Kafka, Spark, Python.
Target Corporation (RETAIL) Bangalore, India
Apr 2012 - Mar 2016
Senior BI Engineer
The Enterprise Guest Program (EGP) was launched as part of the broader Enterprise Information Management (EIM) initiative to focus on delivering a seamless Guest experience across channels (e.g., Store, .com, Mobile, Call Center, Gift Registry). Currently, there is a gap in cross-channel Guest identification and a need to create a more Guest-centric and analytics-driven organization. The following Analytical capabilities have been identified as a priority to enable in EDW- Ad-hoc Analytics, Campaign, Web Experience.
Responsibilities:
•Analyzing the existing system process and functionality and designing the new system in big data with the respective appropriate functioning techniques.
•Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the Guest, transaction data by date.
•Created the process to offload/migrating existing ELT/ETL process to Hadoop.
•Worked extensively with Sqoop for importing data from Teradata to HDFS.
•Developed Hive (hql) scripts, HDFS external/managed tables, Oozie workflow and coordinator application to load the data into HDFS-Landing, foundation layer.
•Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
•Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
•Extensively worked on creating End-End data pipeline orchestration using Oozie.
•Developed HIVE queries to create partitions and buckets to optimize the job processing.
•Providing Technical/functional assistance to offshore team members.
•Reviewing the Hive (hql) scripts, Oozie workflow for the developers and providing the review comments to the developer.
•Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
•Handled all the issues and post production defects raised during the Implementation phase.
•Involved in support for the developed jobs in PROD environment until it gets signed-off.
Environment: DataStage 8.5, Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, Zookeeper.
iGate Patni Global Solutions (HEALTHCRE) Bangalore, India
Nov 2010 - Apr 2012
System Analyst
The goal of this project is to upgrade BIOPROD Oracle source system from R11 to R12 at BI side. As a part of this project will bring in all the changes implemented at source system(BIOPROD) for Oracle Release 12 into BI system like column additions, data type changes, table level additions/deletions and integrate with ODS & EDW system. Retiring/rewiring of EDW, ETL mapping from ODS to EDW with Teradata scripts for GL, AP, AR modules. Identify inventory of objects and gap analysis for the objects affected due to Source system upgrade to R12. Other part of the project includes the retirement of the DPA loads, which is currently sourcing from oracle system and now will be analyzed and routed (Source) it from GEHC-BI ODS system.
•Involved in System requirement design specification for Development and traceability matrix.
•Set up and execution of iTest environment, data integrity & performance issue during migration.
•Good Knowledge on preparing job chains in Cronicle Scheduling tool.
•Implementing project using water fall methodology, involved in development, Unit/ Integration/Regression testing of application using QC against other dependent applications in ODS & EDW using GEHC IMPRD toll gate process.
•Created mappings using different transformations like Source Qualifier, filter, Aggregator, Expression, Lookup, Sequence Generator, Router and Update Strategy.
•Responsible to load staging tables by using Multi Load and Fast Load Scripts.
•Writing BTEQ scripts for moving data from staging to final tables.
•Defined reusable business logics in the form of mapplets and reusable transformation according to the mapping requirements.
•Analysis of certain existing mappings, which were producing errors and modifying them to produce correct results.
Environment: Informatica 8.6.1, Teradata14.00, Unix, Control M.
Verizon Data Services (TELECOM) Chennai, India
Dec 2009 - Nov 2010
Analyst
The scope of this project is to provide implementation services to Verizon to migrate the reporting functionality in the existing SAP BW reports (99 release 4 reports) into OBIEE (Oracle Business Intelligence Reporting tool). The content of the SAP BW reports consider for the scope is from SAP R/3 versions ECC 6.0 and SAP BW version 3.5. Informatica is used as ETL tool for extracting the data from SAP BW and load to Oracle Warehouse. The OBIEE reports would be developed on top of the OBIA (Oracle Business Intelligence Application) covering the Supply Chain and Order Management Analytics and Procurement and Spend Analytics modules.
•Analyzed the business requirements and functional specifications.
•Extracted data from oracle database and spreadsheets and staged into a single place and applied business logic to load them in the central oracle database.
•Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
•Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
•Parameterized the mappings and increased the re-usability.
•Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
•Created procedures to truncate data in the target before the session run.
•Used the PL/SQL procedures for Informatica mappings for truncating the data in target tables at run time.
•Extensively used Informatica debugger to figure out the problems in mapping. Also, involved in troubleshooting existing ETL bugs.
•Created a list of the inconsistencies in the data load on the client side so as to review and correct the issues on their side.
•Written documentation to describe program development, logic, coding, testing, changes and corrections.
•Created Test cases for the mappings developed and then created integration Testing Document.
Environment: Informatica 8.6.1, Oracle 10g, OBIEE, SAP BW, Unix, Control M.
Cognizant Technology Solutions (HEALTHCARE) Chennai, India
Jan 2007 - Dec 2009 Programmer Analyst
The projects involve creation of Data mart which consists of customer’s information. The main process that is performed is Extraction, Transformation and Loading. The system is used to perform full refresh of demographic and inpatient data and incremental update outpatient and physician. This system eliminates guesswork out of reporting. The reporting tools provide on-demand reports to business users and allow for reporting on completed claims/IBNR, supports national and regional health plan reporting.
•Developed mappings to extract, transform and load the data from Flat Files using Informatica.
•Created mappings and sessions to implement technical enhancements for data warehouse by extracting data from sources like Oracle and Delimited Flat files.
•Applied slowly changing dimensions like Type 1 and 2 effectively to handle the delta Loads.
•Prepared various mappings to load the data into different stages like Landing, Staging and Target tables.
•Used various transformations like Source Qualifier, Expression, Aggregator, Joiner, Filter, Lookup, Update Strategy Designing and optimizing the Mapping.
•Developed Workflows using task developer, worklet designer, and workflow designer in Workflow manager and monitored the results using workflow monitor.
•Created various tasks like Session, Command, Timer and Event wait.
•Modified several of the existing mappings based on the user requirements and maintained existing mappings, sessions and workflows.
•Tuned the performance of mappings by following Informatica best practices and also applied several methods to get best performance by decreasing the run time of workflows.
•Prepared SQL Queries to validate the data in both source and target databases.
•Worked on TOAD and Oracle SQL Developer to develop queries and create procedures and packages in Oracle.
•Created Test cases for the mappings developed and then created integration Testing Document.
•Prepared the error handling document to maintain the error handling process.
•Closely worked with the reporting team to ensure that correct data is presented in the reports.
•Production scheduling of Informatica mappings and UNIX scripts using Autosys Scheduler.
Environment: Informatica 7.1, Oracle 9i, Unix, Cognos Report Net 1.1, Autosys Scheduler.