Information Systems Big Data

Location:

Charlotte, NC

Salary:

40$ per hour

Posted:

January 04, 2024

Contact this candidate

Resume:

RAJESH KAIREDDY

FortMill, SC ***** Ó 857-***-**** R ******.*******@*****.*** LinkedIn GitHub

Summary

Google-certified Professional Data Engineer with a Master’s in Information Systems, boasting 2 years of expertise in advanced big data technologies. Demonstrated proficiency in Python and SQL, with a legacy of designing secure applications and harnessing data for actionable insights. Committed to catalyzing data-driven advancements, actively targeting cloud engineer roles to boost organizational growth and deliver unparalleled expertise. Education

•

Northeastern University, Boston, USA May 2023

Master of Science in Information Systems GPA: 3.6

Relevant Coursework: Data Management and Database Design, Application Engineering and Development, Data Science Engineering Methods and Tools, Designing Advanced Data Architecture for Business Intelligence, Engineering of Big Data Systems

•

G. Pulla Reddy Engineering College, Andhra Pradesh, India Sep 2020 Bachelor of Technology in Electronics and Communications Engineering Skills

Programming Languages: Python (NumPy, Pandas, Matplotlib), Java, SQL Big Data Tools: Apache Spark, Hadoop, MapReduce, HDFS, Pig, Hive Visualization & ETL Tools: Tableau, Power BI, Alteryx (ETL), Talend (ETL), Cloud Data Fusion Cloud Tools: Azure, GCP (Google Big Query, Data Studio, Looker, Cloud Storage, Dataproc, DataFlow) Database Tools: DBeaver, MS SQL, MySQL, Oracle, PostgreSQL, MongoDB Work Experience

• Data Operations Analyst Abecedarian LLC Boston, MA Aug 2023 – present

Spearheaded the development and maintenance of automated scripts using Selenium for seamless interaction with the Twitter platform. Focused on tasks such as logging in, retrieving prompts, and extracting links based on hashtags. Significantly enhanced functionality, achieving a notable 30% reduction in manual effort and saving 10 hours weekly.

Managed the daily processing of 10 CSV files, each with a size of 2 GB, using Python scripting and data filtering techniques. Maintained a consistent output of 20,000 unique rows per week, showcasing proficiency in handling substantial data volumes.

Incorporated keywords extracted from prompts into Python scripts to generate upscaled images, showcasing the capabilities of generative AI techniques on our website.

Developed a Python-based web scraping solution using BeautifulSoup and Requests to automate the extraction of essential information, focusing on body tag details, and capturing hyperlinks for comprehensive data collection. Contributed to the creation of a custom GPT model tailored to answer queries about Northeastern University.

• Data Analyst Intern Digital Lync Hyderabad, India Jan 2021 – Aug 2021

Analyzed New York City Citi bikes public data with 60M rows annually using SQL, identifying key bike trips usage and trends.

Developed a union of datasets in Tableau, discovering an 8% noise in user data, which enhanced the accuracy of age, gender, and usage time trends during the COVID-19 pandemic.

Employed Google Data Studio for effective data visualization, elucidating seasonality, weekday versus weekend activity, YOY trends, and station popularity, thereby improving trend identification.

Conducted a thorough time analysis through SQL queries, resulting in a clearer understanding of user behavior and seasonal trends in NYC Citi bike usage.

Isolated user data anomalies spanning over a century, refining the dataset’s integrity for more reliable demographic analysis during a global health crisis.

Visualized complex queried data using Google Data Studio, providing clear insights into user demographics and station popularity, which supported strategic decision-making during the pandemic.

Projects

• IMDB Data Warehousing & Business Intelligence Link

Constructed a resilient ETL Data Pipeline utilizing a Snowflake Schema for IMDB movies, efficiently managing 100+ million rows from diverse platforms such as SQL Server, PostgreSQL, MySQL, Talend, and Alteryx.

Integrated the processed data seamlessly into visualization tools like Tableau and Power BI, providing comprehensive insights into TV shows, encompassing details on seasons, episodes, ratings, genre, and job specifics for associated titles.

Orchestrated master jobs to load 100+ million rows of data in Staging, Integration, and BI schemas in 94 minutes, optimized the performance and increased the efficiency of Talend jobs by 50%

Designed interactive BI dashboards and reports using PowerBI and Tableau representing KPIs for Movies, Tv Shows and People.

•

NYC Taxi Drivers Data Analysis Link

Python, Java, Hadoop, MapReduce, Pig, Hive

Investigated NYC Yellow Cab data, extracting insights into usage patterns, pickup/drop-off locations, and financial trends using custom 4 MapReduce scripts, 11 Hive QL queries, and 5 Pig scripts on a vast dataset of over 5 million records.

Engineered the conversion of a large 5GB Parquet dataset to CSV with a Python script, extracting 50,000 rows/month. Employed Excel for VLOOKUP and datetime column modifications, optimizing data for effective trend analysis in Hive, Pig, and HDFS.

Executed Hadoop MapReduce, optimizing Min, Max, Average, and Word Count algorithms with a Reducer Comparator, resulting in a 25% improvement in processing efficiency.

Explored Hive for strategic PickUp Location analysis, uncovering patterns in busiest months, weekdays, and peak hours. Discerned the popularity of various pickup spots for short and long trips. Utilized Pig for Drop Location insights, identifying trends in average amounts collected, distances traveled, and peak times for drop-off drives

•

Sentiment-Analysis-News-Headlines Link

Python, Jupyter Notebook, TensorFlow, Keras

Implemented sentiment analysis for news headlines using machine learning models such as Logistic Regression, Complement Naive Bayes, K-Nearest Neighbors,and Deep Learning with TensorFlow/Keras to classify sentiments as positive or negative, enhancing decision-making for users through online information. .

Applied advanced techniques for sentiment analysis, including precise labeling with the Snorkel technique and polarity extraction using TextBlob’s sentiment analyzer. Utilized Spacy for efficient text preprocessing, enhancing model input on a substantial dataset of 100,000 data points.

Optimized ML models by finely tuning them with hyperparameter optimization (GridSearchCV), unlocking peak performance. Seamlessly managed resources with Python’s garbage collector, unleashing optimized efficiency and performance.

Attained remarkable results in sentiment analysis; Logistic Regression and Deep Learning demonstrated superior accuracy (90.92% and 95%) and F1 Scores

(0.92 and 0.90). In contrast, KNN showcased a more moderate impact, achieving an accuracy of 77.10% and an F1 Score of 0.81. Certifications

• Google Certified Professional Data Engineer Certificate (Google)

• Google Data Analytics Certificate -Coursera (Google)

Contact this candidate