Naga Satya Alaikya Kalidindi
Email: adzuat@r.postjobfree.com
Phone: 518-***-****
Big Data Engineer/Hadoop Developer
Summary of Experience:
Over 5+ years of diversified experience in Software Design & Development. Experience as Hadoop developer solving business use cases for several clients. Experience in the field of software with expertise in backend applications.
Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD)
Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, Map R, Amazon EMR) to fully implement and leverage new Hadoop features.
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
Strong experience using HDFS, MapReduce, Hive, Spark, Sqoop, Oozie, and HBase.
Establishes and executes the Data Quality Governance Framework, which includes end - to-end process and data quality framework for assessing decisions that ensure the suitability of data for its intended purpose.
Experience setting up AWS Data Platform - AWS CloudFormation, Development End Points, AWS Glue, EMR and Jupyter/Sagemaker Notebooks, Redshift, S3, and EC2 instances
Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries.
Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.
Experience in developing Spark Applications using Spark RDD, Spark - SQL and Data frame APIs.
Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.
Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
Database design, modeling, migration and development experience in using stored procedures, triggers, cursor, constraints and functions. Used My SQL, MS SQL Server, DB2, and Oracle
Strong understanding of Java Virtual Machines and multi-threading process.
Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory
Experience with Snowflake Multi-Cluster Warehouses
Expert in generating on-demand and scheduled reports for business analysis or management decision using SQL Servers Reporting Services (SSRS), Tableau. Periodic reporting is done on a daily, weekly, monthly and quarterly basis which helps the client.
Good understanding of the Data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, a Schema Modelling, Fact and Dimension tables.
Experience in manipulating/analysing large datasets and finding patterns and insights within structured and unstructured data.
Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services.
Strong experience with ETL and/or orchestration tools (e.g. Talend, Oozie, Airflow)
Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database
Experience with Software development tools such as JIRA, Play, GIT.
Strong written and oral communication skills for giving presentations to non-technical stakeholders.
Experience in writing complex SQL queries, creating reports and dashboards.
Proficient in using Unix based Command Line Interface.
IT Skills:
Big Data Tools: Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Impala, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper
Hadoop Distribution: Cloudera CDH, Apache, AWS, Horton Works HDP
Programming Languages: SQL, PL/SQL, Python, UNIX, Pyspark, Pig, HiveQL, Scala, Shell Scripting
Spark Components: RDD, Spark SQL, Spark Streaming
Data Modeling Tools: Erwin Data Modeler, ER Studio v17
Methodologies: RAD, JAD, System Development Life Cycle (SDLC), Agile
Cloud Management: MS Azure, Amazon Web Services (AWS)- EC2, EMR, S3, Redshift, EMR, Lambda, Athena
Databases: Oracle 12c/11g/ 10g, MySql, MS Sql, DB2, Snowflake
No Sql Databases: MongoDB, Hbase, Cassandra
OLAP Tools: Tableau, SSAS, Business Objects, and Crystal Reports 9
ETL/Data warehouse Tools: Informatica, and Tableau.
Version Control: CVS, SVN, Clear Case, Git
Operating System: Windows, Unix, Sun Solaris
PROFESSIONAL EXPERIENCE:
Client: Vistex INC, Hoffman Estate, IL Apr 2020 to till date
Role: Bigdata Engineer/Hadoop Developer
Responsibilities:
Developed a NIFI Workflow to pick up the data from SFTP server and send that to Kafka broker.
Developed Oozie workflow engine to run multiple Hive, Pig, Tealeaf, Mongo DB, Git, Sqoop and Spark jobs.
Good experience in using Relational databases Oracle, MY SQL, SQL Server and PostgreSQL
Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing
Installed Kafka Producer on different severs and Scheduled to produce data for every 10 seconds
Developed a Spark job in Java which indexes data into ElasticSearch from external Hive tables which are in HDFS.
Expert on Tableau reports, dashboards and publishing to the end users for executive level Business Decision.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS and worked extensively with Sqoop for importing metadata from Oracle.
Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
Responsible for implementation and ongoing administration of Hadoop infrastructure
Working on designing the Map Reduce and Yarn flow and writing Map Reduce scripts, performance tuning and debugging.
Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Impala, Tealeaf, Pair RDD's, Nifi, DevOps, Spark YARN.
Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
Experienced Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB.
Used the AWS-CLI to suspend an AWS Lambda function. Used AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS.
Designed, developed and tested Tableau visualizations for dashboard and ad-hoc reporting solutions by connecting from different data sources and databases.
Using Spark Dataframe API in Scala for analyzing data.
Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required
Worked with various HDFS file formats like Parque, IAM, Json for serializing and deserializing.
Implemented Cluster for NoSQL tool HBase as a part of POC to address HBase limitations.
Used IAM to detect and stop risky identity behaviors using rules, machine learning, and other statistical algorithms
Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing
Installed application on AWS EC2 instances and configured the storage on S3 buckets.
Evaluating client needs and translating their business requirement to functional specifications thereby on boarding them onto Hadoop ecosystem.
Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster
Used Kafka and Kafka brokers to initiate spark context and processing livestreaming.
Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.
Implemented many Kafka ingestion jobs to consume the real time data processing and batch processing.
Migrated Map reduce jobs to Spark jobs to achieve better performance.
Strong Knowledge on architecture and components of Tealeaf, and efficient in working with Spark Core, SparkSQL. Designed and developed RDD Seeds using Scala and Cascading. Streaming data to Spark streaming using Kafka
Responsible for creating SQL data sets for Tableau recurring and Ad-hoc Reports.
Responsible to manage data coming from different sources through Kafka.
Responsible for developing data pipeline using Spark, Scala, Apache Kafka to ingestion the data from CSL source and store in HDFS protected folder.
Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
Stored data in AWS S3 like HDFS and performed EMR programs on data stored.
Exposure to Spark, Spark Streaming, Spark MLlib, snowflake, Scala and Creating the Data Frames handled in Spark with Scala.
Very good implementation experience of Object-Oriented concepts, Multithreading and Java/Scala
Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Pair RDD's, Spark YARN
Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI.
Written the Map Reduce programs, Hive UDFs in Java
Extracted and updated the data into HDFS using Sqoop import and export.
Developed HIVE UDFs to incorporate external business logic into Hive script and Developed join data set scripts
Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka for persisting data into Cassandra.
Monitor Hadoop cluster connectivity and security and File system management
Worked on AWS Lambda functions in python for AWS Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
Developed Apache Spark applications by using spark for data processing from various streaming sources.
Environment: Hadoop (HDFS, Map Reduce), Kafka, Scala, AWS Services (Lambda, EMR, Auto scaling), Yarn, IAM, PostgreSQL, Spark, Impala, Mongo DB, Java, Tableau, Pig, DevOps, HBase, Oozie, Hue, Sqoop, Flume, Oracle, NIFI, Git.
Client: Anthem, Atlanta, GA Mar 20 to Oct 20
Role: Data Engineer/Hadoop Developer/ETL Developer
Responsibilities:
Collecting data from various Flume agents that are imported on various servers using Multi-hop flow.
Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume.
Experienced with handling administration activations using Cloudera manager.
Create an Architectural solution that leverages the best Azure analytics tools to solve our specific need in Chevron use case
Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks
Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and ‘big data’ technologies like Hadoop Hive, Azure Data Lake storage
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
Manage and review Hadoop log files
Writing pyspark and spark Sql transformation in Azure Databricks to perform complex transformations for business rule implementation
Implement Spark Kafka streaming to pick up the data from Kafka and send to Spark pipeline.
Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
Wrote Flume configuration files for importing streaming log data into HBase with Flume.
Imported several transactional logs from web servers with Flume to ingest the data into HDFS.
Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
Created Partitioned Hive tables and worked on them using HiveQL.
Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
Acted for bringing in data under HBase using HBase shell also HBase client API.
Worked on UDFS using Python for data cleansing
Created ad-hoc reports to users in Tableau Desktop by connecting various data sources, multiple views and associated reports.
Documenting project design and test plan for various projects landing on Hadoop platform
Exporting of a result set from HIVE to MySQL using Sqoop export tool for further processing.
Implemented the workflows using Apache Oozie framework to automate tasks.
Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
Troubleshoot day to day issues on multiple Hadoop Cluster.
Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.
Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop.
Develop Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
Created various reports using Tableau based on the client’s needs.
Built pipelines to move hashed and un-hashed data from XML files to Data lake.
Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
Loading Data into HBase using Bulk Load and Non-bulk load.
Extensively worked with Spark-SQL context to create data frames and datasets to pre-process the model data.
Environment: Hadoop, Cloudera, Flume, HBase, HDFS, Tableau, MapReduce, Kafka, YARN, Hive, Pig, Sqoop, Oozie, Java, Azure, Data Factory, Databricks, HDInsight, PL/SQL, MySQL, Oracle, TEZ
Client: Prithvi Information Solution Ltd, Hyderabad, India Mar 2017 to Nov 2019
Role: Hadoop Developer
Responsibilities:
Importing and exporting data into HDFS from Oracle Database and vice versa using sqoop.
Involved in review of functional and non-functional requirements.
Wrote MapReduce job using Pig Latin. Involved in ETL, Data Integration and Migration.
Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce
Installed and configured Pig and also written Pig Latin scripts.
Created SSIS packages to pull data from SQL Server and exported to Excel Spreadsheets and vice versa.
Loading data from various sources like OLEDB, flat files to SQL Server database Using SSIS Packages and created data mappings to load the data from source to destination
Setup and benchmarked Hadoop/HBase clusters for internal use
Creating Hive tables and working on them using Hive QL. Experienced in defining job flows.
Created batch jobs and configuration files to create automated process using SSIS.
Designed and implemented data transfer from and to Hadoop and AWS.
Designed and implemented MapReduce-based large-scale parallel relation-learning system
Extensive use of Expressions, Variables, Row Count in SSIS packages
Deploying and scheduling reports using SSRS to generate daily, weekly, monthly and quarterly reports.
Involved in creating Hive tables, loading the data and writing hive queries that will run internally in a map reduce way. Developed a custom File System plugin for Hadoop so it can access files on Data Platform.
The custom File System plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
Environment: Hadoop, MapReduce, AWS, Amazon S3. Pig, SQL Server, Hive, Hbase, SSIS, SSRS, Report Builder, MS Office, Excel, Flat Files, T-SQL.
Education Details:
Bachelor of Technology in JNTU Kakinada 2009.
Master of Technology in JNTU Kakinada 2012.