Big Data Warehouse

Location:

Decatur, IL

Posted:

July 07, 2024

Contact this candidate

Resume:

Santhosh G

Email: *****************@*****.***

Contact: +1-217-***-****

PROFESSIONAL SUMMARY:

Around 7 years of professional experience in all phases of SDLC including requirements analysis, applications design, development, Integration, maintenance, Installation, Implementation and testing of various client server and web applications on Big Data Eco-System.

Extensively worked in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.

Expertise in designing data intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Warehouse, Data Visualization, Reporting, and Data Quality solutions.

Hands on experience in installing, configuring, monitoring, and using Hadoop ecosystem components like Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works and Flume.

Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, CloudWatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Zeppelin and Airflow.

Experience in handling, configuration and administration of databases like MySQL and NoSQL databases like MongoDB and Cassandra.

Experience in designing, developing, and deploying projects in GCP suite including GCP Suite such as Big Query, Data Flow, Data proc, Google Cloud Storage, Composer, Looker etc.

Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment.

Experience in setting up and building AWS infrastructure resources like VPC, EC2, S3, IAM, EBS, Lambda, Security Groups, IaaS, RDS, Dynamo DB, CloudFront, Elasticsearch, SNS and CloudFormation.

Data modelling and database and development for OLTP, OLAP (Star Schema, Snowflake Schema, Data Warehouse, Data Marts, Multi-Dimensional Modelling and Cube design), Business Intelligence and data mining.

Extensively used SQL, NumPy, Pandas, Scikit-learn, Spark, Hive for Data Analysis and Model building.

Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).

Working experience in creating real time data streaming solutions using Apache Spark/Spark Streaming and Kafka and built Spark Data Frames using Python.

Used Amazon Lambda for developing API to manage servers and run the code in AWS.

Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform.

Experience with complex Data processing pipelines, including ETL and Data ingestion dealing with unstructured and semi-structured Data.

TECHNICAL SKILLS:

Cloud Technologies

Amazon Web Services (IAM, S3, EC2, VPC, ELB, Route53, RDS, Auto Scaling, Cloud Front), Jenkins, GIT, CHEF, CONSUL, Docker, and Rack Space, GCP.

Devops Tools

Urban Code Deploy, Jenkins (CI), Puppet, Chef, Ansible, AWS.

Languages

C, SQL, Languages Shell, and Python scripting.

Databases

MySQL, Mongo DB, Cassandra, SQL Server.

Web/App Server

Apache, IIS, HIS, Tomcat, WebSphere Application Server, JBoss.

CI Tools

Hudson, Jenkins, Bamboo, Cruise Control.

Devops or other

Jenkins, Perforce, Docker, deploy AWS, Chef, puppet, Ant, Atlassian-Jira, Ansible, Open Stack and Salt Stack, Splunk.

WORK EXPERIENCE:

Client: Ceva Logistics, Florida, Jacksonville (Remote) Jun 2023– Present

Role: Sr Data Engineer

Responsibilities:

Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customer specifications, Development, and Deployment of the Application and design a reliable and scalable data pipelines.

Worked with various complex queries, sub queries and joins to check the validity of loaded and imported data.

Worked with PowerShell and Unix scripts for file transfer, emailing and other file related tasks.

Designed and implemented ETL pipelines between from various Relational data Bases to the Data Warehouse using Apache Airflow.

Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.

Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.

Worked on data transformation and retrieval from mainframes to oracle, using SQL loader and control files.

Created Tableau Visualizations by connecting to AWS Hadoop Elastic MapReduce.

Developed Custom ETL Solution, Batch processing and Real-Time data ingestion pipeline to move data in and out of Hadoop using Python and shell Script.

Developed Py Spark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed.

Developed data integration strategies for data flow between disparate source systems and Big Data enabled Enterprise Data Lake.

Built a serverless ETL in AWS lambda to process the files that are new in the S3 bucket to be catalogued immediately.

Worked on AWS SQS to consume the data from S3 buckets.

Worked with relational SQL and NoSQL databases, including PostgreSQL and Hadoop.

Objective of this project is to build a data lake as a cloud-based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.

Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, and EMR.

Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.

Developed and deployed to production multiple projects in the CI/CD pipeline for real-time data distribution, storage and analytics. Persistence to S3, HDFS, Postgres.

Realtime data from the source were ingested as file streams to SPARK streaming platform and data was saved in HDFS and HIVE.

Configured Cloud Watch, Lambda, SQS, and SNS to send alert notifications.

Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.

Designed data flow to pull the data using Rest API from a third-party Vendor using OAUTH authentication.

Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.

Designing and implementing a fully operational production grade large scale data solution on Snowflake Data Warehouse.

Worked on Amazon EMR processes data across a Hadoop Cluster of viral servers on Amazon Elastic Computing Cloud.

Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB, Elastic search, Virtual Private Cloud (VPC).

Created architecture stack blueprint for data access with NoSQL Database Cassandra.

Deployed the Big Data Hadoop application using Talend on cloud AWS (Amazon Web Services).

Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.

Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.

Environment: AWS, Hadoop, Hive, HBase, Spark, Oozie, Kafka, My SQL, Jenkins, API, Snowflake, PowerShell, GitHub, Oracle Database 12c/11g, DataStage, SQL Server 2017/2016/ 2012/ 2008.

Client: VMax E Solutions India Pvt ltd, India Feb 2021- Dec 2022

Role: Data Engineer

Responsibilities:

Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customer specifications, Development, and Deployment of the Application.

Involved in designing and deploying a large application utilizing almost the entire AWS stack (Including EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high availability, fault tolerance, and auto-scaling in AWS Cloud Formation.

Working on migration project of moving current applications in traditional datacenter to AWS by using AWS services.

Launching AmazonEC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.

Installed application on AWS EC2 instances and configured the storage on S3 buckets. Assisted the team experienced in deploying AWS and Cloud Platform

Managed IAM policies, providing access to different AWS resources, design and refine the workflows used to grant access.

Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.

Designed AWS Cloud Formation templates to create custom sized VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.

Launched Compute (EC2) and DB (Aurora, Cassandra) instances from Amazon Management Console and CLI.

Installed and configured Splunk Universal Forwarders on both UNIX (Linux, Solaris, and AIX) and Windows Servers.

Hands on experience in customizing Splunk dashboards, visualizations, configurations using customized Splunk queries.

Implemented the Docker for wrapping up the final code and setting up development and testing environment using Docker Hub, Docker Swarm and Docker Container Network.

Elastic search experience and capacity planning and cluster maintenance. Continuously looks for ways to improve and sets a very high bar in terms of quality.

Implemented real time log analytics pipeline using Elastic search.

Setup and configured Elastic search in a POC test environment to ingest over million records from oracle DB.

Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements from Aurora.

Deployed applications on AWS by using Elastic Beanstalk. Integrated delivery (CI and CD) using Jenkins and puppet.

Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running ad hoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.

Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift

Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases

Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift

Responsible for moving data between different AWS compute and storage services by using AWS Data Pipeline.

Design and Develop ETL Processes in AWS Glue to relocate Campaign information from outside sources like S3, ORC/Parquet/Text Files into AWS Redshift.

Data Extraction, accumulations, and combination of Adobe information inside AWS Glue utilizing Py Spark.

Experience in Setting up databases in AWS using RDS, storage using S3 bucket and configuring instance backups to S3 bucket.

Configuring and deploying Open Stack Enterprise master hosts and Open Stack node hosts.

Experienced in deployment of applications on Apache Web server, Nix and Application Servers like Tomcat, JBoss.

Extensively used Splunk Search Processing Language (SPL) queries, Reports, Alerts and Dashboards.

Installation and implementation of the Splunk App for Enterprise Security and documented best practices for the installation and performed knowledge transfer on the process.

Using DB connect for real-time data integration between Splunk Enterprise and databases.

Virtualized the servers using the Docker for the test environments and dev-environments needs and configuration automation using Docker containers.

Environment: Amazon Web Services, IAM, S3, RDS, EC2, VPC, cloud watch, Bit Bucket, Chef, Puppet, Ansible, Docker, Apache HTTPD, Apache Tomcat, JBoss, Junit, Cucumber, Python.

Client: 3G HR Services, Hyderabad, India June 2017– Feb 2021

Role: java Developer

Responsibilities:

The Live Accounts project is a web application that has complete business finances management tools like Invoicing, Inventory, and Support for all types of taxes, detailed financial reports, Business information and analysis features.

Used Agile Methodology (Continuous Integration, Pair Programming, TDD) to develop applications and participated in regular stand-up meetings to provide various updates.

Developed UI Static and Dynamic Single Page Application usingJSPandAngular.js.

Developed cross-browser/platform HTML5, CSS, and JavaScript to match design specs for complex page layouts while adhering to code standards.

Developed business logic at server side using Spring MVC, Spring Boot and some core java concepts like Exception Handling, Multithreading and Collections.

Design the base architectural components as per the MVC architecture using Spring (MVC, DAO, Boot) and Hibernate frameworks.

Implemented logic to integrate with NOSQL database (Couchbase) using Hibernate.

Worked on DAO and MVC design patterns in MVC Architecture using Spring Framework.

Used RESTful web service to modify resources on the server without performing any server-side operations and tested with different REST UI tools like Postman, Swagger.

Expertise in Understanding and Analyzing Test Requirements, Tracking changes and maintenance of Test Requirements. Strong Experience in JUnit & J Mock it.

Monitored error logs using Log4Jin every phase and fixed the issues.

Used GIT as the version control tool and Maven to compile & build the client Applications to manage transitive dependency.

Contact this candidate