Venkateswara Varma Srivatsavaya
Email: ***********@*****.***
Cell Phone: +1-856-***-****
Big Data professional with 10+ years of combined experience in the fields of Hadoop and Database Engineering and Administration.
Professional Summary
Experience in designing, implementing and administering multitenant Cloudera distribution of Hadoop clusters for visualization and analytics on Big Data
Experience in engineering and administering Streamsets/Sqoop for data ingestion, HDFS/HBase/Kudu for distributed storage, Hive for datawarehousing, Impala for interactive querying, Spark for data transformation and advanced analytics, Flume for streaming event data, Solr for content and metadata indexing on unstructured data, Cloudera Navigator for data governance and Cloudera Data Science Workbench for Data Sceince
Experince in working with Business and Application owners to understand high level requirements for onboarding to the BigData and Analytics platform. On a multi tenant platform hosting more than 60+ applications and usecases, working with Application Development, SQA and DevOps teams to ensure successful Production deployments on the BigData and Analytics platform
Experience in setting up High Availability, Replication, Backup and Disaster Recovery on the different services on the Big Data and Analytics platform
Experience in Unix development and administration. Currently maintaining 200 Oracle Enterprise Linux 6 and RedHat Eenterprise Linux 7 Servers which are used for CDH clusters
Experience in setting up and maintaining MySQL/MariaDB relational database used for Metadata Storage for Hive, Cloudera Navigator and other Cloudera Management Services.
Experience in securing the clusters by configuring MIT Kerberos for authentication and Sentry for role based fine grained authorization
Experience in Yarn and Impala Resource Management (CPU/Memory) on a vastly multitenant platform
Experience in Integrating Kerberized Impala with Business Intelligence tools like Tableau and Spotfire
Experience in installing, configuring, upgrading and maintaining Streamsets Data Ingestion platform
Experience in installing, configuring, upgrading and maintaining Cloudera DataScience Workbench to provide self service access to Data Scientists to run Machine Learning and Artificial Intelligence usecases
Experience in developing and testing Java and Python code to automate redundant tasks including but not limied to backups, monitoring and for performing connectivity tests to different services on the cluster
Hands on Knowledge in Amazon AWS services (EC2, EBS, S3, ELB, VPC, Route 53)
Hands on Knowledge in Kafka cluster setup, administration, monitoring, Security, Kafka Streams and KSQL
Technical Skills
BigData Ecosystem: Hadoop, MapReduce, Yarn, Spark, Hive, Impala, HBase, Kudu, Steamsets, Kafka and Cloudera DataScience Workbench
Cloud Computing : Amazon Web Services
Security: Kerberos and Sentry
Programing Languages: Java, VB
Scripting: Unix Shell Scripting and Python Scripting
Business Intelligence Tools: Tableau, Spotfire
Relational Databases: Microsoft SQL Server 2000, 2008 and 2012, Sybase, MySQL
Operating Systems: Oracle Linux 6, RHEL 7, Windows NT / 98 /2000/ XP, Windows 8, Windows 10
Specialized Training and Certifications
Cloudera Certified Hadoop Administrator CCA131
Microsoft Certified Professional MCP
Microsoft Certified Technology Specialist MCTS (Database Development, Implementation and Maintenance)
Work Experience
Hadoop Admin
Client: Statestreet Bank and Trust Company 11/2017 - Present
Domain: Banking
Location: Quincy, MA
Responsibilities:
Installing, configuring, upgrading and maintaining Cloudera distribution of Hadoop on Linux machines and corresponding services like HDFS, Yarn, Hive, Impala, HBase, Kudu, Spark2, Spark On Yarn, Streamsets, Cloudera Datascience Wokbench
Administering Oracle Enterprise Linux 6 and RedHat Enterprise Linux 7 on the CDH cluster nodes. Developing Unix shell scripts for automation of redundant tasks
Setting up and maintaining MySQL/MariaDB relational database used for Metadata Storage for Hive, Cloudera Navigator and other Cloudera Management Services. Setting up MySQL/MariaDB replication for Backup and Disaster Recovery
Configuring MIT Kerberos for authentication and Sentry for role based fine grained authorization
Creating pipelines via Streamsets using multiple origins, destinations, processors and executors to transfer batch/real time data into and out of Hadoop. Providing samples and demo walkthroughs to new Steamsets Users and working with Streamsets Vendor Support to troubleshoot issues
Creating sample Models and running experiments on Cloudera DataScience Workbench to demo to new users exploring Machine Learning and Artificial Intelligence usecases. Supporting corresponding usecases once promoted to higher environments and working with Cloudera Vendor Support to troubleshoot issues
Checking and tuning CPU and Memory both on a system level and also on YARN Resource Management level to make sure Mapreduce, Spark and Impala workloads on the platform are running at their optimum level
Integrating Kerberized Impala with Business Intelligence tools like Tableau and Spotfire. Benchmarking and testing in collaboration with Application teams to configure the optimum Impala settings for performance
Developing and testing Java, Python programs to use spark and read from/ write to Kerberized HDFS, Hive and HBase and provide samples to Application Users on the platform
Working with Business and Application owners to understand high level requirements for onboarding to the BigData and Analytics platform. On a multi tenant platform hosting more than 60+ applications and usecases, working with Application Development, SQA and DevOps teams to ensure successful Production deployments on the BigData and Analytics platform
Reviewing Hive/Impala/Spark workloads of different teams, troubleshooting and providing suggestions for performance boost, while making sure the platform stability is not negatively impacted
Troubleshooting and solving issues related to Operating System, Database and Big Data & Analytics platform
Tracking production tickets in Service Now and providing necessary support
Hadoop Engineer
Client: The Nielsen Company 12/2015 – 11/2017
Domain: Market Research
Location: Lancaster, PA
Responsibilities:
Analyzed the feasibility for migrating data from legacy databases to Hadoop distributed file system
Created sqoop jobs to import data from Oracle to HDFS. Tuned the required mappers by understanding the dataset sizes
Created raw/transformation data tables and consumption views in Hive using Hive Query Language
Created and tested jobs to perform transformation on raw data using Hive Query Language
Coordinated with end clients to understand their querying needs and designed Hive partitioning strategy accordingly
Analyzed Hive Query profiles to identify and resolve performance issues
Experienced in Sentry role based authorization for tables and views
Used Webhdfs/httpfs to copy files from client nodes to HDFS directly
Created oozie workflows to schedule jobs with sqoop and hive actions
Experienced in refreshing Impala with Hive tables and Views. Expertise in performance tuning of Impala by following best practices like computing table and column statistics, join optimization, partition pruning
Experienced in integrating Impala with Tableau and performing load tests to benchmark the memory requirements in Impala
Performed High Availability and Disaster Recovery tests in collaboration with Admin team
Explored Apache Spark Streaming for data transformation on Flume event data
Hands on Knowledge in installation, configuration and administration of Cloudera Distribution of Hadoop clusters
Hands on Knowledge in configuring NameNode High Availability and Resource Manager High Availability
Hands on Knowledge in key administrations concepts like Trash/Snapshot for backups, balancing data across all data nodes, analyzing log files of different services
SQL/Java/Unix Developer
Client: The Nielsen Company 02/2014 – 12/2015
Domain: Market Research
Location: Cherry Hill, NJ
Responsibilities:
Understood the Customers requirement and high level business need to implement solution meeting the business requirements.
Responsible for Detailed design - Preparation of High Level & Low Level Design documents
Developed Stored Procedures, Triggers, Cursors, Tables, Views, SQL Joins required by various applications on the database
Developed Unix shell scripts to automate redundant tasks and scheduled the same via Autosys
Involved in the design and development of code rewrite from C++ to Java for an application that uses the latitude and longitude of the customers to suggest the nearby stores as part of Market Research
Responsible for enhancement and maintenance of HelpDesk application which is used by Operations team to communicate with customers, run daily, weekly, monthly and annual maintenance jobs, process incentives, shipping and labeling of gifts etc
Responsible for Unit Test case preparation and Unit testing
Packaged the developed components to be deployed in production environment
Worked on the production tickets on daily basis and provided Impact Analysis, resolution and DBCRFs and SCRFs
Communicated with other teams to work on the tickets related to other interfaces
Documented critical modules for future uses
Involved in POC for migrating data from RDBMS to Hadoop Distributed File System
SQL/Java/Unix Developer
Client: AIG/Chartis – Commercial Insurance 12/2009 - 02/2014
Domain: Insurance
Location: Chennai, India
Responsibilities:
Prepared Design documents by analyzing the impact of different interfaces and downstream systems dependant on AIWCS (Workers’ compensation policy underwriting system)
Developed Stored Procedures, Triggers, Cursors, Tables, Views, SQL Joins and other statements for various applications, maintained referential integrity and implemented complex business logic
Developed UNIX Shell scripts to automate repetitive database processes
Involved in debugging Java code of existing applications in order to resolve bugs
Made code enhancements in Java as part of application upgrade
Developed complex queries based on business team’s needs to derive insights out of huge datasets
Scheduled daily, weekly, monthly and yearly reports based on requirements from business
Used PVCS for labeling and versioning the code components
Prepared Unit test plan and System Test plan and took part in the testing
Provided UAT support and Production support
Participated in weekly status meetings with onsite
Education
SRM University (Chennai, India)
Bachelor of Technology, Electronics and Communications Engineering