Resume

Data Processing Sql Server

Location:

Houston, TX, 77002

Posted:

February 07, 2024

Contact this candidate

Resume:

Ravi Charan Mannepalli

Email: ad3gfl@r.postjobfree.com Phone: +1-407-***-****

Professional Summary:

I am an Innovative and analytical expert with deep experience in developing and maintaining data pipelines, ETL processes, and data workflow to process and analyze massive amounts of data. With 3+ years working on various projects, including data warehousing, data integration, and data migration. Have worked with a variety of databases, including SQL Server, Oracle, MySQL, and PostgreSQL, and have experience in data migration and replication. Expert in AWS technologies such as S3, Redshift, Glue, Lambda, ECS and EC2, as well as Azure services such as Blob Storage, Data Factory, and Databricks. Working knowledge of Python libraries such as NumPy, Pandas, Matplotlib, Seaborn, NLTK, Sci-kit learning, SciPy, and Pytorch. Key expert in using Apache Spark a valuable big data processing framework that helps to build scalable and efficient data processing pipelines.

Key Skills:

Bigdata Technologies: Databricks, Airflow, Apache Spark, Snowflake.

Cloud Technologies: AWS, EC2, ECS, S3, EMR, Lambda, Redshift, Glue, Azure Databricks, Azure Blob Storage, Azure Data Factory, Azure Stream Analytics.

Programming Languages: Python, Java, JavaScript, SQL.

Databases: MySQL, Oracle, PostgreSQL, SQL Server, NoSQL(Cassandra, MongoDB).

Development Tools: SVN, Git, Maven, Dockers.

Operating Systems: Windows, Linux, and Unix.

Data Visualization: Tableau, Matplotlib

Education:

Bachelor of Technology, Mechanical Engineering, VNRVJIET (2016 – 2020)

Master of Computer Information and Science, Montclair State University (2021– 2023)

Professional Experience:

Data Engineer Jan 2023 - present

Arohak, Jersey City, NJ(Remote)

Designed and implemented an AWS EC2-based infrastructure combined with PySpark resulting in a highly scalable data processing environment that handled large volumes of financial data, improved processing efficiency by 50%, and enabled real-time analysis for risk assessment and fraud detection.

Implemented an integrated data processing workflow using PySpark and Airflow, enabling efficient data extraction, transformation, and loading processes. This streamlined workflow ensured timely and accurate data processing, improving overall efficiency by automating the end-to-end data pipeline and reducing processing time by 30%.

Created a data pipeline from start to finish utilizing ETL procedures, AWS S3, and Snowflake. Real-time sales data from various sources was extracted, transformed, and loaded into S3, where it was immediately available for ingestion into Snowflake.

Leveraging Snowflake's ACID-compliant SQL and dot notation enabled efficient and unified querying of all data types, empowering stakeholders to gain comprehensive insights from a wide range of data sources.

Integrated Snowflake with event-driven architectures such as Apache Kafka and AWS Kinesis which helps to push data directly to Snowflake in real-time, triggering automated processing and analysis.

Ingestion/replication from traditional on-prem RDBMS (e.g., Oracle, MS SQL Server, IBM DB2, MySQL, Postgres) to AWS.

Experience with setting up and operating data pipelines (batch and real time) and data wrangling procedures using Python or SQL in a cloud environment.

Data Engineer Jan 2020 - Jul 2021

Flipkart, Hyderabad, India

Maintaining 99.8% data pipeline uptime while ingesting streaming and transactional data across 5 distinctive major data resources utilizing Databricks, Spark, Redshift, S3, and Python.

Using Databricks and Apache Spark, I designed and constructed a scalable and fault-tolerant large data processing pipeline, lowering processing time by 70% and increasing data accuracy by 20%.

For quicker performance, appraised the SQL scripts and built them with Spark SQL.

Migrated NoSQL Database to Amazon DynamoDB which helped to handle more than 2.5x spikes in transaction volume without extensive pre-planning or downtime and maintain near - 100% uptime.

Containerized data processing applications using Docker and deployed them to Amazon ECS clusters running on EC2 instances.

Optimized ETL processes to reduce processing times and improve scalability using techniques like parallel processing, distributed computing, and caching.

Designed and implemented a highly scalable and fault-tolerant data processing pipeline using Apache Spark and Amazon ECS, resulting in a 30% reduction in data processing time and enabling the organization to handle a 50% increase in data volume.

Academic Projects:

E-commerce Website:

Developed a customer to customer-based e-commerce website using JavaScript, Node.JS, npm, and MongoDB.

This project's goal is to develop a system that functions primarily as an online marketplace for the purchase and sale of goods online.

Maggoty Alumni Website:

Developed a website as a team where college alumni can stay updated on the latest news, can buy college merchandise, create, and attend events and do a lot of stuff.

Used CSS, JavaScript and HTML to complete the development of the website.

Text Mining of Twitter data using Vectorizers and Classifier:

Used 'Sentiment analysis' to differentiate hate speech and free speech by different implementation techniques which helps social media platforms to identify and eliminate such content from their platform.

By completion of the project, gained knowledge on TF-IDF vectorizer, Count Vectorizer, Sentimental Analysis and Text Mining.

Using VGG16 to classify Pistachio types:

Developed a VGG16 CNN intending to classify two different classifications of pistachios, 'Kirmizi pistachios' and 'Siirt pistachio'.

Used Python, Kera’s Library and NumPy to complete the project successfully.

Weather forecasting by Hidden Markov Model:

Described the most important works on the prediction of weather identified in the literature.

Used ‘Hidden Markov Models’ and Viterbi algorithm to forecast the weather.

Certifications:

PCEP – 30 – 02(Python Programming)

Contact this candidate