Senior Data Engineer

Location:

Wichita, KS

Posted:

January 02, 2024

Contact this candidate

Resume:

Akhil Kumar B

Senior Data Engineer

Email: **********@*****.***

Contact : +1-646-***-****

Over 9 years of experience as a Data Engineer specializing in designing, analyzing, and developing software applications. Proficient in Big Data and Hadoop ecosystem components, Spark technologies, and managing large data pipelines on platforms including Spark, AWS, Azure, Node.js, Talend, Hadoop, HDFS, Hive, Impala, Oozie, Sqoop, HBase, Scala, Python, Kafka, Kudu, and NoSQL databases such as Cassandra and MongoDB. Expertise in constructing highly reliable and scalable Big Data solutions across on-premises and cloud-based Hadoop distributions like Cloudera and AWS EMR.

Data Engineer / Data Analyst AWS Azure Hadoop Big Data Analytics Python ETL Snowflake Python Developer ETL Developer Cloud Data Engineer Informatica

PROFESSIONAL SUMMARY

Accomplished Data Engineer with over 9+ years of hands-on experience in designing, developing, and managing robust data solutions.

Expert in big data technologies, including Hadoop, Spark, Hive, HBase, and Impala, with a strong track record of optimizing data processing pipelines.

Proficiency in AWS Secrets Manager for managing and securing cloud credentials.

Familiarity with Azure Key Vault (lower priority).

Extensive proficiency in cloud platforms, having worked on Google Cloud Platform (GCP), Microsoft Azure, and Amazon Web Services (AWS) to deliver scalable and reliable data solutions.

In-depth knowledge of JavaScript and TypeScript.

Adept at architecting and implementing data lakes and data warehouses on cloud platforms, ensuring seamless data integration and analysis.

Developed complex DAX expressions for in-depth analysis.

Strong background in REST and GraphQL API design, OpenAPI Specification, and HTTP Protocol.

Highly skilled in data transformation and ETL processes using tools like Apache Nifi, Apache Camel, and Talend.

Proficient in Python, Scala, and Java for coding complex data processing applications and building efficient data pipelines.

Solid understanding of SQL and NoSQL databases, with hands-on experience in data modeling, schema design, and performance optimization.

Expertise in real-time data processing and streaming technologies, including Apache Kafka, Spark Streaming, and Flink.

Proven ability to develop and maintain data pipelines using containerization and orchestration technologies such as Docker and Kubernetes.

Good understanding of Amazon Web Services (AWS) like EC2 for computing and S3 as storage mechanism and EMR, Step functions, Lambda, Beanstalk, Batch, ECS with Fargate, ECR, RedShift, DynamoDB, Neptune, CloudTrail, CloudWatch and AWS CLI.

Strong knowledge of data governance, data security, and compliance, ensuring data quality and adherence to industry standards.

Hands on experience on Amazon AWS computing like EC2, RDS, GLUE, Athena, Redshift, AWS Lambda, Step Functions, Kinesis, Lake Formation, EMR, Sage maker, RDS and Dynamo DB.

Proficient in advanced SQL, Python, PySpark, SparkSQL, Apache Hudi, AWS Glue, and AWS Data Lake Structures.

Proficiency in SQL databases such as Aurora and SQL Server, as well as data lakes in S3 and enterprise data in Redshift/Snowflake.

Skilled in automating data pipeline deployment and monitoring using CI/CD tools like Jenkins and GitLab.

Proficient in version control systems (Git) and collaborative development methodologies, working seamlessly in cross-functional teams.

Experienced in troubleshooting and optimizing data pipeline performance, ensuring low-latency data availability.

Demonstrated ability to lead data migration projects, seamlessly transitioning data from on-premises systems to cloud environments.

Skilled in data profiling, cleansing, and quality checks to maintain data integrity and security.

Proficiency in workflow management tools like Apache Airflow for orchestrating data tasks and workflows.

Strong knowledge of distributed data storage systems, including Amazon S3, Google Data Query, Google Cloud Storage, Azure Data Lake Storage and, KUSTO (Azure Data Explorer)

In-depth understanding of data warehousing concepts and best practices for efficient data storage and retrieval.

Expert in developing and maintaining real-time dashboards and data visualizations using tools like Tableau, Power BI, and Looker.

Knowledge of machine learning and data science concepts, facilitating collaboration with data science teams to derive insights from data.

Adept at performance tuning, capacity planning, and system monitoring to ensure high availability and scalability.

Strong communication and leadership skills, capable of mentoring junior team members and collaborating effectively with stakeholders.

Committed to staying current with industry trends and continuously improving skills to drive innovation in data engineering.

Adherence to best practices in agile development methodologies, ensuring the delivery of high-quality data solutions.

Proven experience in designing and maintaining data architecture blueprints for long-term scalability and growth.

Expertise in building fault-tolerant and highly available data systems for mission-critical applications.

Strong problem-solving skills, with a track record of resolving complex technical challenges in data engineering projects.

Collaborative and adaptable team player with a passion for leveraging data to drive business decisions.

Demonstrated experience in managing cross-functional projects and driving them to successful completion.

Deep domain knowledge in various industries, allowing for tailored data solutions to meet specific business needs

Skilled in designing and implementing data archival and data lifecycle management strategies, optimizing storage costs and access times.

Proficiency in data cataloging and metadata management, ensuring data lineage and discoverability.

Extensive experience in data replication and synchronization, enabling real-time data consistency across multiple data stores.

Expertise in data partitioning and sharding strategies to handle large datasets efficiently.

Strong background in DevOps practices, automating deployment, monitoring, and scaling of data solutions.

Proficient in managing data security and access controls, including encryption, authentication, and authorization mechanisms.

Hands-on experience with data governance tools and frameworks to enforce data policies and compliance requirements.

Capable of conducting code reviews and ensuring code quality through best practices and coding standards.

Proven record of successful data migration projects, ensuring minimal downtime and data consistency.

Expert in implementing data transformation algorithms and data quality checks to enhance data reliability.

Skilled in performance optimization of data processing tasks, achieving low-latency data availability.

Demonstrated ability to work with data scientists to deploy machine learning models into production data pipelines.

Strong knowledge of containerization technologies like Docker and orchestration platforms like Kubernetes for efficient data pipeline deployment.

In-depth understanding of data streaming and messaging systems, enabling real-time data processing and analysis.

Proficient in serverless computing and event-driven architecture, enhancing data processing efficiency.

Adept at designing and implementing data recovery and backup strategies to ensure data resilience.

Expertise in troubleshooting and resolving data pipeline issues promptly, minimizing disruptions.

Proficiency in designing and implementing data access APIs and ensuring data availability to downstream applications.

Strong collaboration skills to work with cross-functional teams, including data analysts, data scientists, and business stakeholders.

Committed to professional development, staying updated with emerging data technologies and industry trends.

Track record of delivering data solutions on time and within budget, meeting business objectives.

Capable of developing disaster recovery and business continuity plans for data systems.

Proficient in designing and optimizing data storage architectures for cost-efficiency and performance.

Experienced in data profiling and data quality analysis, ensuring data accuracy and reliability.

Strong knowledge of best practices in data modeling for both relational and NoSQL databases.

Adept at integrating data from various sources, including structured, semi-structured, and unstructured data.

Expertise in handling data versioning and change management for evolving data requirements.

Proficiency in automating data pipeline monitoring, alerting, and logging for proactive issue detection.

Skilled in performance testing and benchmarking of data solutions to ensure scalability and efficiency.

Strong verbal and written communication skills for effective collaboration with stakeholders at all levels.

CORE SKILLS

Hadoop Components / Big Data

HDFS, Hue, MapReduce, PIG, Hive, HCatalog, Hbase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos,pyspark

Airflow, SageMaker, Kafka Snowflake,

Languages

Scala, Python, SQL, Hive QL, KSQL, Node.js, TypeScript, .Net, JavaScript, Java springboot, C#

IDE Tools

Eclipse, IntelliJ, pycharm.

Cloud platform

AWS, Azure IaaS, PaaS, SaaS, GCP

Reporting and ETL Tools

Tableau, Power BI, SiSense, Looker, Talend, AWS GLUE, Dremio, Presto, Trino

Databases

Oracle, SQL Server, Aurora, MySQL, T-SQL, PostgreSQL, MS Access, NoSQL Database (Hbase, Cassandra, Mongo DB, Couchbase)

Big Data Technologies

Hadoop, HDFS, Hive, Pig, Oozie, Sqoop, Spark, Machine Learning, Pandas, NumPy, Seaborn, Impala, Zookeeper, Flume, Airflow, Informatica, Snowflake, DataBricks, Kafka, Cloudera

Containerization

Docker, Kubernetes, Object Storage (Minio, Ceph, Apache Ozone)

CI/CD Tools

Jenkins, Bamboo, GitLab, Sonar Cloud, Postman, Lucid Charts, Jira/Confluence

Operating Systems

UNIX, LINUX, Ubuntu, CentOS.

Software Methodologies

Agile, Scrum, Waterfall

Education

Bachelor of Technology – Major - Computer Science - JNTU Kakinada, India

CAREER SUMMARY

Client: PNC Bank, Pittsburgh, USA Jun 2022 – Present

Role: Senior Data Engineer / Hadoop & Spark Developer

Project Description:

Data Integration and Analytics project, this project is to acquire the data from multiple source systems like file feed, Oracle DB and SAP ERP. Data pipelines will be built to extract transform and load the data to Hadoop, which in turn will be utilized as a source data for multiple downstream systems and processes Data Wrangling, Massaging and Enrichment will be performed in Data Lake staging area.

Responsibilities:

Develop and add features to existing data analytic applications built with Spark and Hadoop on a Scala, java and Python development platform on the top of AWS services.

Programming using Python, Scala along with Hadoop framework utilizing Cloudera Hadoop Ecosystem projects (HDFS, Spark, Sqoop, Hive, HBase, Oozie, Impala, Zookeeper etc.).

Involved in developing spark applications using Scala, Python for Data transformations, cleansing as well as validation using Spark API.

Utilized Palantir foundry for efficient data integration and mining, resolving issues in large datasets

Developed Python scripting to support the management of Cloud credentials using AWS Secrets Manager.

Maintained a strong focus on secure and compliant cloud credential management.

Collaborated with stakeholders to understand Control-M job dependencies and schedules.

Maintained and improved Airflow Directed Acyclic Graphs (DAGs) for seamless data movement within the analytics pipeline.

Utilized key skills including MongoDB, k8s (preferably EKS), TDD, ATDD, Java/Python Spark, and AWS services (S3, Lambda, Step Function, AMQ, SNS, SQS, CloudWatch Events).

Familiarity with Dremio, Presto, and Trino for efficient query execution and data retrieval.

Migrated existing Airflow pipelines to SageMaker pipelines using a proprietary ETL library.

Configured and set up Astronomer Airflow environments for optimal performance.

Designed and implemented Airflow DAGs for efficient workflow scheduling.

Analyzed and translated logic from existing Control-M jobs into Airflow workflows.

Worked on all the Spark APIs, like RDD, Dataframe, Data source Dataset, to transform the data.

Worked on both batch processing and streaming data Sources. Used Spark streaming and Kafka for the streaming data processing.

Containerized applications using ECS with Fargate, ensuring streamlined deployment and management.

Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.

Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, and Kafka.

Worked on Apache Nifi to automate the data movement between RDBMS and HDFS.

Created shell scripts to handle various jobs like Map Reduce, Hive, Pig, Spark

Used Hive techniques like Bucketing, Partitioning to create the tables.

Experience on Spark-SQL for processing the large amount of structured data.

Experienced working with source formats, which includes - CSV, JSON, AVRO, JSON, Parquet, etc.

Worked on AWS to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.

Designed and architected solutions to load multipart files which can't rely on a scheduled run and must be event driven, leveraging AWS SNS,

Involved in Data Modeling using Star Schema, Snowflake Schema.

Performed advanced data transformation activities using Informatica, PL/SQL, Oracle Database tuning, SQL tuning, and Informatica Server administration.

Used AWS EMR clusters for creating hadoop and spark clusters. These clusters are used for submitting and executing scala and python applications in production.

Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.

Designed, implemented, and maintained batch and streaming data pipelines between SQL sources (Aurora, SQL Server) and a data lake in S3.

Utilized AWS cloud services, including Kinesis, S3, Lake Formation, Glue, and Step Functions, to create scalable and reliable data pipelines.

Implemented both batch and real-time streaming data integration solutions using AWS Kinesis and relevant technologies.

Migrated the data from AWS S3 to HDFS using Kafka.

Designed and architected solutions to load multipart files which can't rely on a scheduled run and must be event driven, leveraging AWS SNS, SQS, Fargate.

Integrating Kubernetes with network, storage of security to provide a comprehensive infrastructure and orchestrating the Kubernetes containers across the multiple hosts.

Implementing Jenkins and built pipelines to drive all microservice builds out to Docker registry and deploying to Kubernetes.

Experienced in loading and transforming of large sets of structured, semi structured data using ingestion tool Talend.

Worked with NoSQL databases like HBase, Cassandra to retrieve and load the data for real time processing using Rest API.

Worked on creating data models for Cassandra from the existing Oracle data model.

Responsible for transforming and loading the large sets of structured, semi structured and unstructured data.

Environment: Hadoop 2.7.7, HDFS 2.7.7, Apache Hive 2.3, Apache Kafka 0.8.2.X, Apache Spark 2.3, Spark-SQL, Spark-Streaming, Zookeeper, Pig, Oozie, Java 8, Python3, S3, EMR, EC2, Redshift, Cassandra, Nifi, Talend, HBase,Cloudera (CHD 5.X).

Client: Logging-in, Novi, Michigan Sep 2021 – Jun 2022

Role: Azure Data Engineer

Project Description:

The project also involves planning, design, creation, and development of the workflows to initiate, manage, and remediate production incidents. Deploy the automated workflows and test and validate applications and workflows, ensure and maintain compliance with making changes and implementing code, and update the documentation to cover department-level processes Develop a data set process for data mining and data modeling and also recommend the ways to improve data quality, efficiency and reliability.

Responsibilities:

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Analysis services, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks..

Responsible for writing Hive Queries to analyze the data in Hive warehouse using Hive Query Language (HQL).Involved in developing Hive DDLs to create, drop and alter tables.

Extracted the data and updated it into HDFS using Sqoop Import from various sources like Oracle, Teradata, SQL server etc.

Adapted to changes and evolving cloud platforms, including Azure Key Vault (lower priority).

Designed, implemented, and expanded data pipelines, performing ETL activities to extract, transform, and load data.

Created Hive staging tables and external tables and also joined the tables as required.

Implemented Dynamic Partitioning, Static Partitioning and also Bucketing.

Installed and configured Hadoop Map Reduce, Hive, HDFS, Pig, Sqoop, Flume and Oozie on Hadoop cluster.

Worked on Microsoft Azure services like HDInsight Clusters, BLOB, ADLS, Data Factory and Logic Apps and also done POC on Azure Data Bricks.

Monitored database performance and coordinated changes to the database management system.

Implemented Sqoop jobs for data ingestion from the Oracle to Hive.

Applied data warehousing concepts, including OLTP, OLAP, Dimensions, Facts, and Data modeling.

Worked with various formats of files like delimited text files, click stream log files, Apache log files,Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats.

Developed custom the Unix/BASH SHELL scripts for the purpose of pre and post validations of the master and slave nodes, before and after the configuration of the name node and datanodes

Developed job workflows in Oozie for automating the tasks of loading the data into HDFS.’

Implemented compact and efficient file storage of big data by using various file formats like Avro, Parquet, JSON and using compression methods like GZip, Snappy on top of the files.

Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame and Pair RDD's.

Worked on Spark using Python as well as Scala and Spark SQL for faster testing and processing of data.

Worked on various data modelling concepts like star schema, snowflake schema in the project.

Extensively used Stash, Bit-Bucket and GITHUB for the code control purpose.

Environment: Hadoop 2.7, HDFS, Microsoft Azure services like HDinsight, BLOB, ADLS, KUSTO (Azure Data Explorer), Logic Apps etc, Hive 2.2, Sqoop 1.4.6, snowflake, Apache Spark 2.3, Spark-SQL, ETL, Maven, Oozie, Java 8, Python3, Unix shell scripting.

Client: SAMHSA, Rockville, MD Dec 2019 - Aug 2021

Role: Data Engineer / BI Developer (Tableau) / Alteryx Developer

Project Description:

Developed BI analytics strategy for Tableau suite of products for SAMSHA (Substance Abuse and Mental Health Services Administration).

The Project involved in developing the dashboards in Tableau environments for Metrics and Compliance department.

Key Contributions:

I was involved in Data Analysis, Data Profiling, Data Integration, Migration, Data governance, metadata management, and Master Data Management and Configuration Management. The Project involved in developing the dashboards in Tableau, environments for Metrics and Compliance department.

Responsibilities:

Worked with various versions in Tableau, Tableau Server, Tableau Public, Tableau Reader and Tableau Online.

Extensive experience on building dashboards using Tableau.

Applied strong knowledge of Tableau server architecture to implement business rules, data validations, and access controls.

Utilized Tableau prep, extracts using hyper APIs, and storyboards for comprehensive data analysis and reporting.

Utilized advanced DAX expressions for complex calculations to provide in-depth analysis.

Ability to introduced Tableau to Client’s to produce different views of data visualizations and presenting dashboards on web and desktop platforms to End-user and help them make effective business decisions.

Experience with creation of users, groups, projects, workbooks and the appropriate permission sets for Tableau server logons and security checks.

Created incremental refreshes for data sources on Tableau server.

Involved in Requirements gathering/analysis, Design, Development, Testing and Production roll of Reporting and Analysis projects.

Able to creatively stretch the capability of Tableau to customize and develop creatively stunning and dynamic visualizations.

Collaborated with business and technology stakeholders to gather requirements and ensure effective communication.

Contributed to technical discussions as a consultant for feasibility studies, including technical alternatives, best practices, and risk assessments.

Created Tableau Dashboards with interactive views, trends and drill downs along with user level security.

Provided Production support to Tableau users and Wrote Custom SQL to support business requirements.

Deep, technical knowledge of both Tableau Desktop and Tableau Server from an administrative and a consultative standpoint.

Strong oracle SQL skills with good knowledge of RDBMS concepts.

Work directly with Quality Assurance associates to ensure all deliverables are accurate and of the highest quality.

Designed and developed ETL workflows and datasets in Alteryx.

Conducted data preparation, analysis, and reporting, ensuring data accuracy and quality.

Troubleshoot technical issues and performed root cause analysis for ETL jobs, collaborating with cross-functional teams.

Processed data in Alteryx for tableau reporting.

Experience in creating different visualizations using Bars, Lines and Pies, Maps, Scatter plots, Gantts, Bubbles, Histograms, Bullets, Heat maps and Highlight tables.

Involved in Trouble Shooting, Performance tuning of reports and resolving issues within Tableau Server and Reports.

Defined, advocated, and implemented best practices and coding standards for the team.

Designed and developed ETL solution using SSIS, Azure data factory and created CAS jobs using Microsoft internal tools like Geneva Analytics data studio.

Environment: Hadoop,Tableau Amazon Redshift, DAX, Data stage 7.5, Python, Alteryx, Teradata, Oracle 12c, SQL Server 2008, TOAD, PL/SQL, Java Script.

Client: Cybage Software Private Limited, India Jan 2016 to Aug 2019

Role: Hadoop Developer

Responsibilities:

Used PySpark for extracting, cleaning, transforming, and loading data into a Hive data warehouse.

Worked with Spark, creating RDDs to optimize data processing from local files, HDFS, and RDBMS sources.

Developed Spark jobs with window functions for data analysis and transformation to support downstream reporting and analytics.

Assisted in database management, maintenance, and enhancement, including designing data dictionaries, physical and logical database models, and performance tuning.

Utilized SQL and DDL to define database objects and control access, ensuring data integrity.

Experience in Hive, including partitions, bucketing, and managed/external tables, for efficient data storage and retrieval.

Managed structured data from Oracle, MySQL, and Teradata within HDFS and designed Hive tables for efficient querying.

Implemented Sqoop jobs for data movement between RDBMS and HDFS.

Utilized Oozie workflow engine for scheduling and automation of time-based data processing tasks.

Optimized Hive queries using file formats like PARQUET, JSON, and AVRO for improved performance.

Imported/exported data between MySQL, CSV, and text files and HDFS on a regular basis.

Created detailed API specifications using OAS to define endpoints, request/response structures, and authentication mechanisms.

Developed SQOOP scripts for data migration from Oracle to Big Data environment.

Designed and developed enhancements using AWS APIs for CSG.

Migrated Hive and MapReduce jobs to EMR, automating workflows with Airflow.

Implemented Data Lake using Hadoop stack technologies like SQOOP and HIVE/HQL to consolidate data from multiple source databases.

Managed AWS Database services, including RDS and DynamoDB, to ensure data availability and performance.

Implemented centralized security management for Hadoop cluster, ensuring data security and access control.

Utilized data modeling tools (ERwin, PowerDesigner, Visio) to create conceptual, logical, and physical data models.

Enhanced system automation by working with scheduling tools like Control-M, improving job scheduling and execution.

Assisted in Unix system administration, optimizing system performance and troubleshooting issues.

Strong business analysis skills, well-versed in SDLC phases, including requirements, analysis/design, development, and testing.

Proficient in Data Lake, Business Intelligence, and Data Warehousing concepts, focusing on ETL and SDLC.

Experienced with Oracle Business Intelligence Enterprise Edition (OBIEE) for analyzing and reporting enterprise data.

Connected to various data sources (databases, spreadsheets, online services) for data extraction and transformation in Power BI visualization.

Environment: Oracle, Hadoop, Hive, Spark, Sqoop, Python, Pyspark, MySQL, Teradata, Oozie.

Client Cygnus Software Pvt. Ltd – India Nov 2013 – Dec 2015

Data Modeller/Engineer

Cygnus Software Pvt. Ltd., is a comprehensive healthcare management system based on its BlueGenet platform. It provides an integrated solution with which you could build a network of hospitals, general & specialty clinics, laboratories, blood banks, pharmacies and many others service providers. These can be linked even if located in a geographically dispersed area, all over the globe and in multiple languages. Moreover, the patient gets access to his information via web/phone/email as well as facilities such as near-home delivery through co-ordinate service providers and online registration and schedule appointments.

Responsibilities:

Working with open source Apache Distribution then Hadoop admins have to manually setup all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site. However, when working with popular Hadoop distribution like Hortonworks, Cloudera or MapR the configuration files are setup on startup and the Hadoop admin need not configure them manually.

Used Sqoop to import data from Relational Databases like MySQL, Oracle.

Involved in importing structured and unstructured data into HDFS.

Responsible for fetching real-time data using Kafka and processing using Spark and Scala.

Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.

Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.

Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.

Implemented physical database parameters and executed data modeling and prototyping.

Worked on Hive to implement Web Interfacing and stored the data in Hive tables.

Contributed to the development of ETLs and executed security recovery procedures.

Migrated Map Reduce programs into Spark transformations using Spark and Scala.

Experienced with Spark Context, Spark-SQL, Spark YARN.

Implemented Hive Partitioning and Bucketing on the collected data in HDFS.

Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.

Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.

Developed Spark scripts using Scala shell commands as per the business requirement.

Worked on Cloudera distribution and deployed on AWS EC2 Instances.

Experienced in loading the real-time data to a NoSQL database like Cassandra.

Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).

Developed and maintained RESTful APIs using Flask and FastAPI, enabling efficient communication between front-end and back-end systems.

Utilized Django to design and build robust and scalable web applications, optimizing user experiences.

Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.

Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).

Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.

Well versed in using Elastic Load Balancer for Auto scaling in EC2 servers.

Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop, Map Reduce, Hive, Spark, Oracle, Flask, Django, GitHub, Tableau, UNIX, Cloudera, Kafka, Sqoop, Scala, NIFI, HBase, Amazon EC2, S3,python

Contact this candidate