+ PROFESSIONAL PROFILE
·Big Data, Cloud, Hadoop experience covers 9+ years.
·Overall IT/Systems Admin experience spans 10+ years.
·Work with large, complex data sets, real-time/near real-time analytics, and distributed Big Data platforms.
·Knowledgeable of Hadoop Architecture and Hadoop components (HDFS, MapReduce, JobTracker, TaskTracker, NameNode, DataNode, ResourceManager, NodeManager.
·Implemented, set-up and worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS).
·Work with various file formats (delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files).
·Install and configure Hive, Pig, Sqoop, Flume and Oozie on Hadoop clusters.
·Provide end-to-end data analytics solutions and support using Hadoop systems and tools on cloud services as well as on premise nodes.
·Profile and tune SQL execution plans.
·Hands-on expertise in Hadoop components - HDFS, MapReduce, Hive, Impala, Pig, Flume, Sqoop and HBase.
·Experienced in Big Data ecosystem using Hadoop, Spark, Kafka with column-oriented Big Data systems on AWS cloud platform.
·Knowledgeable of deploying the application jar files into AWS instances.
·Apply real-time log data collection from multiple sources, including social media (Facebook, Twitter, Google, LinkedIn), webserver logs, and databases using Flume.
·Deploy large multiple nodes of Hadoop and Spark clusters.
·Develop custom large-scale enterprise applications using Spark for data processing.
·Schedule workflows and ETL processes with Apache Oozie.
·Proven effective at problem-solving with the demonstrated ability to combine logic with creativity to derive innovative solutions that match customers’ needs.
·Experience developing and fine-tuning SQL, Python, Scala, Pig, Hive, RDDs, DataFrames, etc.
·Proficient in major vendor Hadoop distributions such as Cloudera, Hortonworks, and MapR.
·In-depth knowledge in incremental imports, partitioning, and bucketing concepts in Hive and Spark SQL needed for optimization.
·Gather and aggregate various sources and integrate into HDFS.
·Hands-on with Apache Flume; staging data in HDFS for further analysis.
·Well-versed in installation, configuration, administration, and tuning Hadoop cluster of major Hadoop distributions (Cloudera CDH 3/4/5, Hortonworks HDP 2.3/2.4, and Amazon Web Services (AWS).
·Research and recommend machine learning algorithms and data sets for advancing state-of-the-art techniques for a variety of analytics, including entity resolution, entity reconciliation, named entity recognition, co-reference, anaphora, etc.
·Design Big Data solutions for traditional enterprise businesses.
·Define job flows in Hadoop environment using tools such as Oozie for data scrubbing and processing.
·Configure Zookeeper to provide Cluster coordination services.
·Load logs from multiple sources directly into HDFS using tools such as Flume.
·Commission and decommission nodes on Hadoop Cluster.
·Configure NameNode High Availability and perform Hadoop Cluster Disaster Management.
·Skilled converting business requirements into concrete deliverables that optimize business operations.
·Develop add-ins using AngularJS, REST / JSON, and JQuery to create customized web forms.
·Skilled SQL Server developer (query optimization, stored procedures, etc.) and Administration (failover using high availability, mirroring techniques, jobs, database backups, etc.).
·Provide guidelines and help in setting up development, staging, and production environments for SharePoint.
+ TECHNICAL SKILLS
BIG DATA
·Apache Spark, Apache Kafka, Apache Hadoop, HDFS, Apache Hive, Oozie, Cloudera, Hortonworks, Zookeeper, Solr
AMAZON WEBSERVICES
·Kinesis, EMR, EC2, Redshift, IAM, S3
CLOUD
·AWS, Azure, Google Cloud Platform (GCP)
DATABASES
·HBase, MongoDB, DynamoDB, MySQL
PROGRAMMING LANGUAGES
·Scala, Python, SQL
DATA VISUALIZATION
·Tableau, Microsoft Power BI.
FILES
·HDFS, Avro, Parquet, Snappy, Gzip, SQL, JSON, GSON, ORC.
PLATFORMS
·Linux, Microsoft, UNIX
OTHER SKILLS
·SDLC, Agile, Waterfall, Agile, Test-Driven Development, Continuous Integration
+ PROFESSIONAL EXPERIENCE
01/2020 – Present / Cloud Engineer / Old National Bank, Evansville, IN
Old National Bank is an American regional bank with nearly 200 retail branches operated by Old National Bancorp.
·Developed Kafka queue system to collect log data without data loss and publish to various sources.
·Built AWS Cloud Formation templates used for Terraform with existing plugins.
·Developed AWS Cloud Formation templates to create custom infrastructure of our pipeline.
·Developed multiple Spark Streaming and batch Spark jobs using Scala and Python on AWS EMR.
·Developed, designed, and tested Spark SQL jobs with Scala and Python Spark.
·Created User Defined Function (UDF) using Python in Spark.
·Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift.
·Implemented AWS Lambda functions to run scripts in response to events in Amazon Dynamo DB table or S3.
·Populated database tables via AWS Kinesis Firehose and AWS Redshift.
·Made and oversaw cloud VMs with AWS EC2 and AWS administration reassurance.
· Set up workflows in Apache Airflow to run ETL pipelines using tools in AWS.
·Used Spark DataFrame API over Cloudera platform to perform analytics on Hive data.
·Added support for Amazon AWS S3 and RDS to host static/media files and the database into AWS.
·Used Ansible with Python Scripts to generate inventory and push the deployment to AWS Instances.
·Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
·Automated the installation of ELK agent (file beat) with Ansible playbook.
·Implemented AWS IAM user roles and policies to authenticate and control access.
·Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS.
·Worked on AWS Kinesis for processing huge amounts of real time data.
·Worked with AWS Lambda functions for event-driven processing to various AWS resources.
·Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift.
·AWS EMR to process big data across Hadoop clusters of virtual servers on Amazon Simple Storage Service (S3).
·Implemented security measures AWS provides, employing key concepts of AWS Identity and Access Management (IAM).
·Ingested data through AWS Kinesis Data Stream and Firehose from various sources to S3.
06/2017 – 01/2020 / Big Data Engineer / Dollar Tree, Chesapeake, VA
Dollar Tree ia a multi-price-point chain of discount variety stores operating 15,115 stores throughout the 48 contiguous U.S. states and Canada.
·Collaborated with Corporate IT function around integrating Hadoop ecosystems with critical enterprise systems.
·Developed and managed cluster-related testing activities.
·Set up and stored data in HBase for analysis.
·Set up, installed, and monitored 3-node enterprise Hadoop cluster on Ubuntu Linux.
·Created roadmaps for ongoing cluster deployment and growth.
·Created MapReduce jobs in Java for log analysis analytics and data cleaning.
·Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
·Loaded data into spark RDD and do in memory data computation to generate the Output response.
·Loaded unstructured data (Log files, XML data) into HDFS using Flume.
·Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
·Implemented Hadoop data pipeline to identify customer behavioral patterns.
·Used Sqoop to import data from RDBMS into HDFS.
·Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive amounts of structured and unstructured data.
·Used Spark-SQL to load JSON data and created Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
·Used Hive to analyze partitioned and bucketed data and compute various metrics for reporting.
·Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDDs.
·Utilized Amazon Web Services (AWS) Cloud services EC2, S3, EBD, RDS and VPC.
·Analyzed and interpreted transactional behaviors and clickstream data with Hadoop and HDP to predict what customers might buy in the future.
·Utilized MS PowerBI for reporting.
·Established data consumption from information stored in Google Cloud, consisting of network clicks and network impressions.
·Processed Big Data using Hadoop, MapReduce, Sqoop, Oozie, and Impala.
·Imported data from MySQL to HDFS using Sqoop to load data.
·Analyzed Hadoop clusters using Big Data analytic tools Hive MapReduce.
·Conducted in-depth research on Hive to analyze partitioned and bucketed data.
10/2014 – 06/2017 / Big Data Engineer / Anheuser-Busch, St. Louis, MO
Anheuser-Busch Companies, LLC is an American brewing company.
·Created and deployed data pipeline using Apache Kafka and spark streaming in Scala and Python.
·Deployed and analyzed large chunk of data using spark in Scala.
·Performed ETL on the data stored in HDFS using Spark and loaded the data into Hive.
·Utilized MapReduce, HDFS, Hive, Pig, and MongoDB.
·Used Hive where convenient for some ETL tasks, then querying the data in Impala. Additionally, provided a SQL frontend for data managed by HBase.
·Automated, configured, and deployed instances on AWS, Azure environments and Data Centers.
·Applied data files produced by MapReduce and other Hadoop components, utilized data formats from simple (text) to compact and efficient (Avro, RCFile, SequenceFile) to be optimized for data warehouse queries (Parquet).
·Created workflows and scheduled different big data application using Oozie.
·Worked in Cloudera Distribution and managed different clusters and services using Cloudera Manager.
·Sqoop to import the data from different RDBMS.
·Maintained quality of the data so that it can be used by the Data Scientists in the team.
·Used structured spark streaming to process and ingest the data into the NoSQL databases.
·Used SparkSQL to process and analyze the data using SQL.
·Worked in an agile environment managed by JIRA.
·Created different dataflows using NIFI and created test cases to test the NIFI dataflows.
·Used Hue in Cloudera.
·Used Solr to create search engines for the data.
·Created Python scripts to download data from the APIs and perform pre-cleaning steps.
·Installed and configured Tableau Desktop to connect to the Hortonworks Hive Framework (Database) which contained the bandwidth data through the Hortonworks JDBC connector for further analytics of the data.
·Wrote simple SQL scripts on the final database to prepare data for visualization with Tableau.
06/2013 – 10/2014 / Hadoop Administrator / Gensler, San Francisco, CA
Gensler is a global design and architecture firm.
·Worked on Hortonworks Hadoop distributions.
·Worked with users to ensure efficient resource usage in the Hortonworks Hadoop clusters and alleviate multi-tenancy concerns.
·Set-up Kerberos for more advanced security features for users and groups.
· Set up Hortonworks infrastructure from configuring clusters to Node security using Kerberos.
·Configured, installed, and managed Hortonworks (HDP) Distributions.
· Developed Oozie workflows for scheduling and orchestrating the ETL process.
· Used Spark to build and process real-time data stream from Kafka Producer.
· Used Spark DataFrame API over Cloudera platform to perform analytics on data.
·Implemented enterprise security measures on big data products including HDFS encryption/Apache Ranger. Managed and scheduled batch jobs on a Hadoop Cluster using Oozie.
·Managed Hadoop clusters via Command Line, and Hortonworks Ambari agent.
· Worked on Kafka cluster environment and Zookeeper.
· Monitored multiple Hadoop clusters environments using Ambari.
· Implemented security on HDP Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambari and Ranger.
· Secured the Kafka cluster with Kerberos.
· Worked on tickets related to various Hadoop/Big data services which include HDFS, Yarn, Hive, Sqoop, Spark, Kafka, HBase, Kerberos, Ranger, Knox.
· Performed cluster maintenance and upgrades to ensure stable performance.
· Defined data security standards and procedures in Hadoop using Apache Ranger and Kerberos.
· Worked on Hortonworks Hadoop distributions.
04/2012 – 06/2013 / System Administrator / Synnex Corporation, Fremont, CA
Synnex Corporation is an American multinational corporation that provides B2B IT services.
·Performed various system administration technical work tasks on Linux-based systems.
·Configured DNS and DHCP on clients’ networks.
·Provided technical support via telephone/email to over 3,000 users.
·Created database tables with various constraints for clients accessing FTP.
·Served as Red Hat Enterprise Linux Administrator.
·Built, installed, and configured servers from scratch with OS of RedHat Linux.
·Performed Red Hat Linux Kickstart installations on RedHat 4.x/5.x, performed Red Hat Linux Kernel Tuning, memory upgrades.
·Installed, configured, and performed troubleshooting of Solaris, Linux RHEL, HP-UX, AIX operating systems.
·Applied OS patches and upgrades on a regular basis and upgraded administrative tools and utilities and configured or added new services.
·Installed and configured Apache, Tomcat, and Web Logic and Web Sphere applications.
·Applied remote system administration using tools like SSH, Telnet and Rlogin.
+ EDUCATION
Universidad Nacional Autonoma de Mexico - Bachelor of Mathematics