Big Data Analysis

Location:

Charlotte, NC

Posted:

January 30, 2024

Contact this candidate

Resume:

Sukumar Balla

Hadoop, AZURE & Spark Developer

***********@*****.***

949-***-****

Professional Summary:

•Over 11+ years of professional IT experience with 8Years of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.

•Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

•Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.

•Experience in installation, configuration, supporting and managing Hadoop Components using Hortonworks

•Experience in developing a data pipeline through Kafka-Spark API.

•Hands on Experience in AWS EC2, S3, Redshift, EMR, RDS, Glue, Dynamo DB.

•Balancer. Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.

•Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.

•Experience in Big Data analysis using Scala, Python, PIG and HIVE and understanding of SQOOP.

•Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

•Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

• Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

• Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.

• Worked on performing transformations & actions on RDDs and Spark streaming data.

• Extensive knowledge in Tuning SQL queries, improving the performance of the Database.

•Experience in managing Hadoop clusters using Cloudera Manager Tool.

•Manage all CM tools (JIRA, Confluence, Maven, Jenkins, Git, GitHub, Visual Studio) and their usage / process ensuring traceability, repeatability, quality, and support.

•Hands on experience in application development using Java, RDBMS, and Linux shell scripting.

•Ability to adapt evolving technology, strong sense of responsibility and accomplishment.

EDUCATION

Masters in Computer Science, California University of Management and Sciences, Anaheim, California – 2015

Bachelor of Engineering in Computer Science, Jawaharlal Nehru Technological University, Hyderabad, India – 2008

TECHNICAL SKILLS

Hadoop/Big Data

HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Spark, Spark SQL, Spark Streaming

Languages

Java, SQL, XML, C++, C, WSDL, XHTML, HTML, CSS, Java Script, AJAX, PLSQL.

Java Technologies

Java, J2EE, Hibernate, JDBC, Servlets, JSP, JSTL, JavaBeans, JQuery and EJB.

ETL/ELT Tools

Informatica, Pentaho

Design and Modeling

UML and Rational Rose.

Web Services

SOAP, WSDL, UDDI, SDLC

Scripting languages

Java Script, Shell Script

Version Control and integration

CVS, Clear case, SVN,GIT,Jenkins.

Databases

Oracle 10g/9i/8i, SQL Server, DB2, MS-Access

Environments

UNIX, Red Hat Linux, Windows 2000/ server 2008/2007, Windows XP.

PROFESSIONAL EXPERIENCE

Role: Sr Big Data Engineer Duration: Oct 2022-Present

Company: Metlife, Cary, NC

•Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.

•Experience in using Apache Sqoop to import and export data to from HDFS and external RDBMS databases.

•Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.

•Implemented Spark using pyspark and Spark SQL for faster testing and processing of data using Data Bricks.

•Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

•Hadoop Developer with hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Ozie and Flume.

•Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDD's, Spark YARN.

•Migrating the needed data from Oracle, MySQL in to HDFS in using Sqoop and importing various formats of flat files in to HDFS.

•Proposed an automated system using Shell scripts to sqoop the job.

•Developed a data pipeline for data processing using Kafka-Spark API.

•Developed a strategy for Full load and incremental load using Sqoop

•Implemented POC to migrate map reduce jobs into Spark RDD transformations.

•Good exposure with Agile software development process.

•Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.

Environment: Sqoop, Spark Core, Spark SQL, MySQL, ADF,GIT, Agile, Apache Hadoop, HDFS, Pig Hive, Hortonworks, Oracle, Tableau,Sparkpython,Spark,IBM Studio.

Role: Azure Big Data Engineer Duration: March 2020-Sep 2022

Company: Lowes, Charlotte, NC

•Export all analyzed data to relational databases.

•Load and transform sets of unstructured, semi-structured and structured data.

•Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.

•Implemented Spark using pyspark and Spark SQL for faster testing and processing of data using Data Bricks.

•Implemented map-reduce programs to handle semi/unstructured data like XML, JSON, Avro data files and sequence files for log files.

•Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.

•Designed and developed pig data transformation scripts to work against unstructured data from various data points and created a baseline.

•Experienced in implementing Spark RDD transformations, actions to implement the business analysis.

•Designed and developed data pipelines, data warehouses and data marts to integrate new data sets from different sources into a data platform.

•Implemented enterprise-level Azure solutions such as Azure Databricks, Data Factory, Logic Apps, Azure Storage Account, and Azure SQL DB

•Optimized pipeline implementation and maintenance work using Databricks workspace configuration, cluster and notebook optimization.

•Developed a data pipeline for data processing using Kafka-Spark API. Experience in developing a batch processing framework to ingest data into HDFS, Hive, and HBase.

•Develop and maintain data pipelines using Azure Data Factory and Azure Databricks

•Create and manage data processing jobs using Azure HDInsight and Azure Stream Analytics

•Perform data modeling and schema design for efficient data storage and retrieval.

•Optimize data processing and storage for performance and cost efficiency.

•Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.

Environment: Azure, Adf,DataBricks, Java, Sqoop, Spark Core, Spark SQL, MySQL, ADF,GIT, Agile, Apache Hadoop, HDFS, Pig Hive, Hortonworks, Oracle, Tableau,Sparkpython,Spark

Role: Big Data Engineer Duration: July 2017- Feb 2020

Company: Blue Cross Blue Shield (BCBSFL), Jacksonville, FL

•Worked with Spark and Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark.

•Utilized AWS services, including EC2 and S3, to enhance system scalability and availability. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

• Exported the result set from HIVE to MySQL using Shell scripts.

•Deploying and configuring AWS services and resources, including EC2 instances, RDS databases, S3 buckets, Lambda functions, and more, based on project needs.

•Monitoring and responding to security incidents and vulnerabilities in the AWS environment.

•Complete resource provisioning using Infrastructure as Code (IAC) tools such as AWS CloudFormation or Terraform.

•Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product.

•Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.

•Installed HDFS on AWS EC2 instances and developed multiple MapReduce jobs in PIG and Hive for data

•cleaning and pre-processing.

•Experience in interacting with customers and working at client locations for real time field testing of products and services.

•Ability to work effectively with associates at all levels within the organization.

Environment: YARN, Ambari,Hive, Java, Sqoop, Spark Core, Spark SQL, MySQL, ADF,GIT, Agile, Apache Hadoop, HDFS, Pig Hive, Hortonworks, Oracle, Tableau,Sparkpython,SparkR,AWS,EC2,S3

Role: Big Data Engineer Duration: Feb 2017- June 2017

Company: Microsoft, Seattle, WA

Responsibilities:

Develop HIVE queries for the analysts.

Executing parameterized Pig, Hive, impala, and UNIX batches in Production.

Experience in one or more of the following cloud platforms: Microsoft Azure, Databricks, Microsoft Enterprise Cloud, or other cloud technologies.

Experience in using Apache Sqoop to import and export data to from HDFS and external RDBMS databases.

Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.

Experienced in working with spark eco system using Spark SQL and Scala,pysaprk queries on different formats like Text file, CSV file.

Environment: AZURE, HDInsight, Microsoft Azure, Microsoft Enterprise Cloud, YARN, Ambari, Hive, Java, Sqoop,Spark Core, Spark SQL, MySQL, ADF,GIT,Agile, Apache Hadoop, HDFS, Pig Hive, Hortonworks, Oracle, Tableau.

Role: Big Data Engineer Duration: May 2015 – Jan 2017

Company: T-Mobile, Seattle, WA

Responsibilities:

•Understanding and analyzing business requirements, High Level Design and Detailed Design

•Extensive scripting in Perl and Python.

•Design and Develop Parsers for different file formats (CSV, XML, Binary, ASCII, Text, etc.).

•Extensive usage of Cloudera Hadoop distribution.

•Executing parameterized Pig, Hive, impala, and UNIX batches in Production.

•Big Data management in Hive and Impala (Table, Partitioning, ETL/ELT, etc.).

•Design and Develop File Based data collections in Perl.

•Extensive Usage of Hue and other Cloudera tools.

•Used Map Reduce JUnit for unit testing.

•Extensive usage of NOSQL (HBASE) Database.

•Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, Cassandra and Hive).

•Design and Develop Dashboards in ZoomData and Write Complex Queries.

•Worked on Shell Programming and CronTab automation.

•Monitored System health and logs and respond accordingly to any warning or failure conditions.

•Extensively worked in Unix and Redhat environment.

•Performed testing and bug fixing.

Environment: Apache Hadoop, HDFS, Perl, Python, Pig, Hive, Java, Sqoop, Cloudera CDH5, Oracle, MySQL, Tableau,AWS,Talend, Elastic search, ZoomData, Storm, Data governance,Agile.

Role: Sr.Systems Engineer (ATG/Java Developer) Duration: Feb 2011 to Dec 2013

Company: Mastermind Information Systems, Hyderabad, India

Responsibilities:

•Understanding and analyzing business requirements, High Level Design and Detailed Design

•Involved in three releases of versions eShop 2.0.1, eShop 2.1 &eShop 2.2.

•Provided high level systems design; this includes specifying the class diagrams, sequence diagrams and activity diagrams

•Utilized Java/J2EE Design Patterns - MVC at various levels of the application and ATG Frameworks

•Worked extensively on DCS (ATG Commerce Suite) using the commerce API to accomplish the Store Checkout.

•Expertise in developing JSP’s, Servlets and good with web services (REST, SOAP)

•Served as DB Administrator, creating and maintaining all schemas

Environment: ATG, JAVA, JSP, Oracle 9i, 10g, Weblogic 10.3.5, SOAP, RESTFul, SVN, SQL Developer, UNIX, Eclipse. XML, HTML, CSS, JavaScript, AJAX, JQUERY, SCA.

Contact this candidate