PROFESSIONAL SUMMARY:
A Motivated individual with around 8 years of experience as a Data Engineer with Expertise in designing, building, and deploying high-performance data solutions using tools like Hadoop Technologies, Data warehouse, Data Lake, Database, and Data Visualization along with AWS and Azure Cloud Technologies.
Extensive knowledge and experience in AWS data management and analytics services include Amazon Aurora, DynamoDB, S3, EC2, VPC, Boto3, Lambda, Redshift, Athena, EMR, Glue Crawler, CloudWatch, and SNS.
Expertise in Azure data management and analytics like Azure SQL Database, CosmoDB, Database Migration Service, Data Factory, Blob Storage, Data Bricks, Synapse Analytics, and Log Analytics.
Strong proficiency in Azure DevOps tools such as Azure DevOps Tool Kit, Azure Monitor.
Skilled in using Data Visualization tools, including Tableau, Power BI, and Quick Sight, to create interactive and informative dashboards, reports, and visualizations.
Experienced in using and managing relational database management systems (RDBMS), including Oracle SQL, MySQL, PostgreSQL, and Amazon Aurora.
Proficient in NoSQL databases, including MongoDB, Cassandra, HBase, and Amazon DynamoDB, to handle large, complex, and dynamic data sets.
Experienced in big data technologies and frameworks, including Hadoop, HDFS, Hive, Zookeeper, HBase, Oozie, Pig, Kafka, and Spark.
Expertise in using data integration tools such as Sqoop, Flume, Alteryx, Alteryx Server, and YARN.
Proficient in using Scala and PySpark for big data processing and analytics, leveraging the power of Apache Spark to handle large volumes of data and deliver insights.
Expertise in Python Programming and Machine Learning packages including NumPy, Pandas and Matplotlib.
Experienced in using Jenkins, Azure DevOps for continuous integration and continuous delivery (CI/CD) of software projects, automating the build, test, and deployment process.
Skilled in deploying and managing cloud-based infrastructure, using Kubernetes and Apache Airflow to orchestrate and manage workflows.
Experienced in designing and implementing data warehousing and analytics solutions on Snowflake and to store and manage data in a scalable and performant manner.
Proficient in using Git, a distributed version control system, in managing software development projects and tracking changes to source code.
Experienced in using Agile methodologies and Jira to manage workflows and prioritize tasks.
TECHNICAL SKILLS:
AWS Services
Amazon Aurora, Amazon Dynamo DB, S3, EC2, lambda, Glue,
Redshift, Athena, EMR, Cloud watch, RDS, SNS.
Azure Services
Azure SQL Database, Azure CosmoDB, Data Factory, Blob storage, Data Bricks,
Synapse Analytics, Log Analytics, Azure DevOps Tool Kit, Azure Monitor.
Databases
Oracle SQL, MySQL, PostgreSQL, MongoDB, Cassandra, HBase.
Hadoop Technologies
Hadoop, MapReduce, Kafka, PySpark, Pig, Hive, YARN, Flume,
Sqoop, Oozie, Snowflake, Zookeeper, Cloudera, Data Bricks.
Visualization Tools
Tableau, Quick Sight, Power BI
Programming Languages
Scala, Python, SQL
Other
Jenkins, GitHub, Windows, Linux.
PROFESSIONAL EXPERIENCE:
Client: HarbourVest Partners - Boston, MA February 2022 – Present
Role: AWS Data Engineer
Project description:
HarbourVest Partners is a global private equity investment firm. It manages a diversified portfolio of investments. Our project goal is to help Data Analysts to identify patterns, trends, and correlations with Primary Investment Data for Investment Strategies.
Responsibilities:
Experienced in designing and implementing data migration strategies for large-scale data sets, including data extraction, transformation, and loading processes.
Created batch data pipeline to extract primary investment data from Amazon Aurora and Amazon Dynamo DB and load it to the S3 data lake using Glue jobs and leveraging Aws Lambda.
Fetched data from S3 by implementing a Glue crawler to glue catalog to create tables and query the results in Athena.
Implemented AWS glue Data Catalog to improve the efficiency of the ETL process by writing PySpark scripts in glue.
Worked on S3 Glacier and Deep Glacier archives for storing and backing up the data.
Created Airflow, scheduling scripts using Python, and added different tasks and dependencies to DAGs.
Used PySpark and Kafka for Implemented real-time stream analytics laboratory service availability for using Amazon Kinesis Data Analytics from server logs and feed insights continuously to improve the user experience.
Used Jenkins to implement end-to-end Continuous Integration and Continuous Delivery (CI/CD) to develop and build data pipelines and move them to higher environments, and schedule batch jobs with Jenkins.
Used Spark SQL to refine data and write queries using aggregate functions to calculate statistics.
Integrated Elastic Kubernetes service with AWS services, including EC2 and VPC, to run containers in production.
Utilized Boto3 to combine Amazon Glue and Lambda functions and to deregister unused AMIs across all application regions, lowering EC2 resource costs.
Used Aws CloudWatch to set alarms, monitor the performance of applications, and visualize logs.
Worked on setting notifications to indicate pipeline status by using AWS SNS on EMR and Glue.
Used Git for version control for source code management, including collaboration with cross-functionalities on code development and code consistency.
Interacted with business users to understand requirements, map them to design and implement requirements, and attended Scrum meetings to discuss day-to-day status following Agile development methodologies.
Tech Stack: Amazon Aurora, Amazon Dynamo DB, AWS S3, EC2, VPC, Boto3, Git, lambda, Glue jobs, Redshift, Athena, EMR, Cloud watch, SNS, CI/CD pipeline, PySpark, Kafka, Jenkins, Apache Airflow, EKS, Agile, Amazon S3 Glacier.
Client: Vulcan Materials Company – Birmingham, AL August 2021-February 2022
Role: Azure Data Engineer
Project Description:
Vulcan Materials Company is a leading producer of construction materials and plays a critical role in the development of infrastructure. We worked on data cataloging to organize the construction materials data, migrating the Product sales and Suppliers data from on-premises to the cloud, and creating batch pipelines to extract insights and generate reports.
Responsibilities:
Built batch pipelines using Azure data factory to transfer product sales data from Azure SQL Database to Azure Synapse Analytics.
Utilized Azure data migration for Migrating data from MYSQL to Azure SQL Database
Used PySpark scripts and Azure Data Factory with Azure Data Bricks to perform complicated transformations and manipulations and stored them in Azure Data Lake Storage.
Created pipeline using Snow Pipe to load the data into snowflake tables from Blob Storage to Snowflake.
Built data warehouse, configured and Executed Snowflake architecture to load stock data through SnowSQL, Azure Blob Storage and Snowpipe to deliver business value. Streamed data using SnowPipe and visualized through PowerBI.
Used Table streams along with CDC to capture the changes made to the database and stream those changes to the data lake in real time.
Used Event Hubs for performing real-time analytics, archiving data streams to Azure Blob Storage, and triggering other azure services led to high throughput and low latency.
Integrated and Deployed CI/CD Pipelines and moved pipelines to higher environments by scheduling jobs in Azure DevOps.
Used Log analytics and Splunk to monitor logs for troubleshooting and fixing the bugs and Used notification hubs to set notifications.
Utilized Azure monitor for setting up alerts to notify when a pipeline performance drops below the threshold level.
Created dynamic dashboards by using PowerBI for data visualization and analysis to present them for Business purposes.
Followed Agile Methodology to visualize work, Pi Planning, and Sprint planning to deliver quality deliverables within the deadline.
Tech Stack: Azure SQL Database, Azure Data Factory, Azure blob storage, Azure Data Bricks, Azure Synapse Analytics, Snowflakes, CDC, Log Analytics, Azure DevOps Tool Kit, Azure Monitor, PowerBI, Agile, Splunk.
Client: Citi Bank – Hyderabad, India February 2018 – July 2021
Role: Data Engineer
Project Description:
Citibank is a multinational investment bank and financial services corporation that provides various products and services, such as retail banking, credit card, etc. We worked for the home loans department and our goal is to compute the operations on customer data if they are eligible for any loans depending on their activity, worked on migrating and store the aggregated data that is available for analysts.
Responsibilities:
Developed multiple ETL jobs and automated them using Amazon EMR by transferring the data from HDFS to S3.
Created batch data pipeline to extract data from S3 and load it to the RedShift using Glue jobs.
Used Pyspark and Scala for preparing scripts to automate the ingestion process as needed through Various sources such as API, AWS S3 and Redshift.
Store the stream data to HDFS using Python and receive real-time data from Kafka by Configuring Spark streaming.
Stored the Structured data by using Hive and Unstructured data by using HBase in AWS EMR.
Cleaned the data in HDFS by developing MapReduce (YARN) programs which were obtained from Various data sources to make it suitable for ingestion into Hive schema for analysis.
Created Schema RDD and loaded it into Hive Tables and used Spark-SQL to Load JSON data for handling structured data using Spark SQL.
Used Sqoop’s command-line Interface and APIs for importing and exporting the data between HDFS and RDBMS.
Generated reports for the BI team by exporting the analyzed data to the relational databases for visualization using Sqoop.
Created custom UDFs for extending Hive and Pig core functionality.
Enabled the ODBC/JDBC data connectivity from those to Hive tables and worked on Tableau and Flink.
Tech Stack: AWS S3, Glue, AWS EMR, Redshift, Spark SQL, Sqoop, Flume, YARN, Kafka, MapReduce, Hadoop, HDFS, Hive, Tableau, Spotfire, HBase, MySQL.
Client: Ekincare- Hyderabad, India June 2015 -February 2018
Role: Python Developer
Project Description:
Ekincare is a fast-growing healthcare company. We worked on the project, where we work on developing and maintaining web applications to manage patient data and monitor health outcomes and debugging, testing, and troubleshooting applications to ensure that they are functioning properly.
Responsibilities:
Developed and maintained high-traffic web applications using Django and Python
Responsible for setting up Python REST APIs using Django REST Framework.
Managed the storage and deletion of content by using Django and Python to interface with the jQuery UI.
Used MySQL DB and Python-MySQL connector package for writing and executing various MYSQL database queries from Python.
Developed Dynamic websites for Linux and Windows-based systems using LAMP and WAMP Servers.
Developed Web Services using SOAP for sending and getting data from the external interface in the XML format.
Adapted all phases of the Software Development Life Cycle (SDLC) and was responsible for gathering requirements, analysis, design, implementation, and testing.
Tech Stack: Python, Django, REST API, jQuery, MySQL, LAMP, WAMP, SOAP, XML, SDLC.
EDUCATION:
Master’s in computer and information Sciences from the University of North Texas.