Around * years of IT experience which includes *+ years of experience as Hadoop/Spark developer using Big data technologies include Hadoop Ecosystem, Spark Ecosystem and 1+ year of Database administration. Experience as Hadoop Developer with good knowledge in SPARK, Snowflake, SCALA, YARN, PIG, HIVE, SQOOP, IMPALA, HDFS and HBASE. TECHNICAL SKILLS:
Big Data ECO System: HDFS, YARN,
Spark, Hive, Pig, Kafka, Oozie,
ZooKeeper, Sqoop, Impala, Spark
Streaming, Spark SQL, Informatica
BDM(Big Data Management)
Programming & Scripting
Languages: Snow-SQL, SCALA, SQL,
Pig Latin, Hive Query Language, Shell
Scripting
GIS (Geographic Information
System): ArcCatalog, ArcGIS
Administrator, ArcMap, ArcGIS
Desktop
Cloud Technologies: AWS (Amazon
Web Services) EC2, S3, IAM, CLOUD
WATCH, SNS, SQS, EMR
Database Tools: OEM, RMAN,
Oracle Netca, Data pump, DBCA,
Data Guard
Databases: Snowflake, Oracle 11g,
Hbase, MySQL
BigData Distribution: Hortonworks,
Cloudera
Operating Systems: Linux,
Windows, Kali Linux
IDE/Build Tools/CI: Eclipse, Intellij,
Sublime, Maven, SBT,Jenkins
Version Control: Git,, Microsoft
VSTS
SDLC: SAFE Agile, Waterfall
GARLA KARTHIK
DATA ENGINEER
Plano, Texas ************@*****.*** 816-***-**** PROFESSIONAL SUMMARY
• Good working knowledge of Amazon Web Service components like EC2, EMR, S3, IAM.
• Experience in working with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
• Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
• Experience in developing unit test Informatica objects, dynamic mappings, sessions, and workflows based on the prepared low-level design documents or SQL queries.
• Experience in finding appropriate GIS (Geographic Information System) data through multiple sources and data extraction.
• Experience with Hadoop cluster management, Administrative, operations using Oozie, Yarn, Ambari and Zookeeper.
• Hands on experience in various Big data application phases like Data Ingestion, Data Analytics & data visualization.
• Extending Hive and Pig core functionality by writing custom User Defined Functions.
• Knowledge in job/workflow scheduling and monitoring tools like Oozie & Zookeeper.
• Experience in analyzing data using HIVEQL, PIG Latin.
• Hands on experience in application development using Java, SCALA, SPARK and Linux shell scripting.
• Import data into HDFS, HBase and Hive using sqoop from Teradata.
• Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
• Familiar with Kafka, Spark with YARN Local and Standalone modes.
• Documented and explained implemented processes and configurations in upgrades.
• Expertise in creating Hive Internal/External tables, Hive's analytical functions, views and writing scripts in HQL.
• Experience in using Accumulator variables, Broadcast variables and RDD caching for spark streaming.
• Experience in using Sequence files, ORC, AVRO and Parquet file formats.
• Experienced in implementing scheduler using Control-M, Oozie, Crontab and Shell scripts
• Experience with SVN and GIT for code management and version control in collaborated projects.
• Installed and configured Horton Works Hadoop cluster on 10 nodes in Test environment using Amazon EC2 and EBS storage volumes. EDUCATION:
Bachelor of Technology in Electrical &
Electronics Engineering
S.A.S.T.R.A University – Tamil Nadu, India
Master’s in Computer and Information
Systems Security/Information Assurance
University of Central Missouri -
Warrensburg, MO
CERTIFICATIONS:
AWS Certified Developer – Associate
Validation Number: ZHWCJLF2CME1QVGR
Validate at: http://aws.amazon.com/verification
Hortonworks Certified Associate
Validate at: http://bcert.me/sfeitwlq
http://verify.skilljar.com/c/ricaq6s4pd7x
September 2019 – Current
PROFESSIONAL EXPERIENCE:
Client: Verizon
Big Data Engineer
Project Description: The Fiber Store is a single repository of VZ fiber enabled locations, with existing & prospect customer information and the services available to generate intelligent leads that is controlled, financially certified and federated, to increase penetration in the buildings and monetize the fiber assets. Responsibilities:
• Design and develop custom built scripts/programs to integrate various components,increase consistency, automate tasks, alerts, assist in monitoring/diagnosing and processing of data in the Data Lake using Hadoop components in Hortonworks HDP cluster.
• Automate data extraction processes by developing, testing and scheduling batch data processing jobs for incremental loads and perform data validation to ensure there are no data format issues or impact to data structures using SPARK, SCALA and Design Patterns.
• Deploy application code/packages in Production and Non Production environments.
• Work with Application Development, Middleware, offshore developers, System Administrators and other teams to coordinate all activities and work to debug issues as needed.
• Continue the support of projects related to migration of applications from on-prem to Amazon’s cloud computing platforms and technologies, as well as ongoing severs and Operating System upgrades.
• Possess strong knowledge on Oracle Database Administration.
• Experience on installation and maintenance of oracle RAC databases in production.
• Experience in tuning SQL queries to optimize performance.
• Performed automation of the DBA tasks in order to reduce the day to day burden of the DBA monitoring activities and to get rid of the human error.
• Experience in JAVA programming with skills in analysis, design, testing and deploying with various technologies like Java, J2EE, JavaScript, JSP, JDBC, HTML, XML and JUnit.
• Used various Project Management services like BMC, JIRA for handling service requests and tracking issues
Client: Texas Department of Transportation Austin TX Nov 2018 – August 2019 Data Engineer
Project Description: TxDOTCONNECT is custom-built system for managing the data of delivery of transportation programs, projects and right of way. The system is designed to help Texas Department of Transportation staff work at peak efficiency as they build a world-class transportation system for Texas. TxDOTCONNECT will allow users to make a leap forward in productivity and replace 40 legacy systems with an enterprise-wide system to plan, manage and measure transportation programs. It is designed for automating workflow between stakeholders and providing a single source for project data in Data Lake making information easier to access and share by standardizing data and reporting formats. Responsibilities:
• Design and develop custom built scripts/programs to integrate various components, increase consistency, automate tasks, alerts, assist in monitoring/diagnosing and processing of data in the Data Lake using Hortonworks HDP cluster.
• Using Data Integration Framework for Data Ingestion from relational databases (Oracle, MySQL) and ArcGIS geodatabase into HDFS and AWS S3 buckets based on upstream sources, compare schemas, validate using Data Validator, handle logging and alerting throughout the HDP cluster.
• Develop scripts for Spatial type handling such as geometry column using ArcPy functionality in Python and convert shape files into well known text (WKT) format and then to a comma separated value list (CSV).
• Analyze, Design, Develop and Test Informatica BDM (Big Data Management) mappings and workflows for Geospatial Data Warehouse Phase-2 and TMS (Traffic Management Systems).
• Replicating large amounts of data from relational databases to target hive or HDFS with Mass Ingestion Service by creating mass ingestion specification using Informatica BDM (Big Data Management).
• Manage Hortonworks HDP cluster in Non – Production environments and perform upgradation of components and apply security patches whenever required.
• Using CONTROL-M for job scheduling, application deployment and workflow processes for Hadoop and SPARK jobs.
• Monitor, maintain and support the new instances of Data Lake on AWS.
• Automate data extraction processes by developing, testing and scheduling batch data processing jobs for incremental loads and perform data validation to ensure there are no data format issues or impact to data structures using SPARK, SCALA and Design Patterns.
• Create a data mart in the data lake to enable Tableau access to ingested upstream data sources in order to build the in-scope metrics.
• Using custom web services built using ArcGIS to query the geospatial data containing geometry features.
• Using Geodata copier for ingesting datasets with geospatial data into their pre-created ArcGIS feature class tables in PostgreSQL database.
• Interact with business users and business analysts for gathering business requirements, analyzing BRD documents and provide optimized solution.
• Using VSTS (Visual Studio Team System) for tracking user stories, version control, document test reports and defect tracking.
Environment: Hortonworks Hadoop, Apache SPARK, Scala, Informatica BDM (Big Data Management), ArcCatalog, ArcGIS Administrator, ArcMap, ArcGIS Desktop, Python, Microsoft VSTS, Sqoop, Oracle 11g, MySQL, HBase, PostgreSQL, Control
– M, Oozie, Maven, JIRA
Client: Capital One Plano TX April 2018 – Nov 2018 Big Data Engineer
Project Description: Capital One Financial Corporation is a bank holding company specializing in credit cards, auto loans, banking and savings products. The main purpose of project Floga is to encrypt sensitive data related to Auto Finance applications from various source systems such as MySQL, Teradata, PostgreSQL and migrate to AWS s3 buckets and load them into Snowflake cloud EDW.
Responsibilities:
• Developed database unloader framework using SPARK and scala which extracts data from PostgresSQL database and saves output file in s3 bucket.
• Developed extensive Snow-SQL queries to do transformations on the data to be used by downstream models.
• Prepared technical design documents which includes Metadata, BDQ, Dependency, ILDM required for DRM team.
• Used Nebula for Metadata registration and datasets registration.
• Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
• Implemented ADQ Framework using Spark Scala and AWS API which is to validate the splitted data sets by comparing its metadata with the metadata registered in Nebula.
• Involved in complete end to end code deployment process in UAT & Production environments.
• Developed shell scripts for OTS jobs which loads history data from dumps to snowflake.
• Used Teradata Parllel Transporter to export huge volumes of history data in TB’s and load data into staging S3 buckets and then to Snowflake tables.
• Scheduled various Control-M jobs and monitored YARN logs for debugging.
• Used GIT for version control.
• Hands-on experience with AWS services such as EC2, S3, EMR, Bastion Host for the computing and storage of data. Environment: Snowflake cloud EDW, EMR, S3, Spark, Hive, Teradata, PostgreSQL, Scala, Snow-SQL, Linux, shell scripting, Cloudera, Jenkins, Maven, Git, Control-M scheduler, Agile, Nebula. Tata Consultancy Services, Bangalore, India
Hadoop Developer Sep 2014 – Aug 2016
Client: NN – EUSELA
Project Description: Novo Nordisk is a Danish multinational pharmaceutical company headquartered in Bagsvaerd, Denmark, with production facilities in seven countries, and affiliates or offices in 75 countries. Novo Nordisk employs more than 40,000 people globally and markets its products in 180 countries. It is the largest publicly traded company in the Nordic countries by market capitalization. The scope of the project deals with data of their products and marketing representatives.
Responsibilities:
• Responsible for building scalable distributed data solutions using Hadoop.
• Created a data lake which will embrace the existing history data from OLAP databases and also to suffice the need to process the transactional data and co-ordinated with the data modellers to create Hive tables which will replicate the current warehouse table structure.
• Migration of ETL processes from oracle to Hive to test the faster and easy data manipulation.
• Performed Data transformations in Pig & HIVE and used partitions, buckets for performance improvements.
• Installed and configured Horton Works Hadoop cluster on 10 nodes in Test environment using Amazon EC2 and EBS storage volumes for testing POC.
• Developed MapReduce programs on Healthcare domain data to generate faster reports which were running slow in OBIEE dashboards due to rapid data growth.
• Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
• Load and transform large sets of structured, semi structured and unstructured data.
• Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyse the logs to identify issues and behavioural patterns.
• Worked with support teams and resolved operational & performance issues.
• Solved performance issues in Hive and Pig with understanding of Joins, Group and aggregation and transfer to Map Reduce.
Tata Consultancy Services, Bangalore, India
Database Administrator Aug 2013 – Sep 2014
Client: Novo Nordisk
Project Description: Novo Nordisk is a Danish multinational pharmaceutical company headquartered in Bagsvaerd, Denmark, with production facilities in seven countries, and affiliates or offices in 75 countries. Novo Nordisk employs more than 40,000 people globally and markets its products in 180 countries. It is the largest publicly traded company in the Nordic countries by market capitalization. The scope of the project deals with data of their products and marketing representatives.
Responsibilities:
• Responsible for Maintaining RAC Databases and single instance databases in production and non-production environment.
• Responsible for Oracle Installations and Oracle Database Upgrades.
• Managed production, development and Test databases in 24*7 environments.
• Monitoring the alert log files on timely basis.
• Generating AWR, ADDM & ASH reports for analyzing Database performance issues.
• Exporting and Importing database backups using Exp & Imp and EXPDP/IMPDP utilities.
• Created and monitored the table spaces, allocated table spaces for users, configuring archive log mode for the database.
• Created and managed database objects like tables, views, and indexes, Managing the physical and logical objects of the database, monitor physical and logical backup strategies, managing redo logs, checking alert log and trace files for errors.
• Created users with restricted access and privileges, groups, roles, profiles and assigned users to groups and granted privileges and permissions to appropriate groups.
• Experience in Installation, Configuration & Maintenance of OEM Grid control.
• Experience in Migrating databases from one platform to other platform Oracle Data Migration Assistant.
• Experienced in maintaining Oracle Active Data Guard and in cloning the databases through Hot/Cold and RMAN utility.
• Responsible for taking Hot/Cold and RMAN Backups and in recovering the Database when required.
• Knowledge on Oracle Database performance-tuning services with EXPLAIN PLAN, TKPROF, AUTO TRACE, AWR REPORTS.
• Reorganization of tables, fragmented table spaces and database using export and import tools and frequent rebuilding of indexes.
• Experience in Applying patches using OPatch utility and gathering the database statistics to improve performance.
• Interacted with clients on daily basis for requirement gathering, clarifications, resolutions and status updates. Environment: Oracle 11gR2, RAC, RHEL-5, Putty, SQL Developer, OEM, ETL, Informatica, DAC, OBIEE, SIEBEL CRM