Data Python

Location:

Wheeling, IL

Salary:

90000

Posted:

September 20, 2020

Contact this candidate

Resume:

NEELIMA VADDI

adf9lj@r.postjobfree.com 414-***-**** linkedin.com/in/neelima-v-745277182/

SUMMARY

Augmented with Masters in Computer and Information Sciences (Big Data and Data Analytics), around 3.5 years of experience in data engineering and data analysis. Experienced, result oriented, resourceful, and problem-solving Data engineer with leadership skills. A very good experience in using predictive modeling, data processing, data mining algorithms to solve business queries. Good knowledge on Tableau, PowerBI, Informatica and AWS components. EXPERIENCE

Associate Consultant, Capgemini India Jun 2015 - Oct 2018

• Worked closely with the data source owners to evaluate the makeup of policy-holder features over time.

• Explored several modelling methodologies in pursuit of predicting the key outcomes and showed that important behavior patterns could be predicted accurately at the policy holder level.

• Built data visualization using Tableau and accessed aggregations using SQL clients and SQL-Workbench.

• Modernized data analytics environment using cloud based SPLUNK, version control system GIT, Automatic deployment tool Jenkins.

• Constructed product-usage data aggregations using Py-Spark, Spark SQL and maintained in AWS S3 location for reporting, data science dash boarding and ad-hoc analyses.

• Design and develop ETL integration patterns using Python on Spark.

• Utilized Spark in Python to distribute data processing on large streaming datasets to improve ingestion and processing speed of the data by 87%.

• Responsible for data extraction and data integration from diﬀerent data sources by creating ETL pipelines Using Spark.

• Prepared the validation report queries, executed after every ETL runs, and shared the resultant values with business users in diﬀerent phases of the project.

• Responsible for maintaining quality reference data in database by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment.

• Worked with clustering, classiﬁcation, and graph analytics algorithms.

• Queried and monitored databases with millions of entries.

• AWS usage for data analysis with R and Python; visualization and statistics models.

• Develop complex SQL queries for data-analysis and data extraction by utilizing knowledge of joins and understanding database landscape.

• Used Python for SQL/CRUD operations in DB, ﬁle extraction/transformation/generation.

• Conducted Data validation work using Amazon Redshift, MySQL. Analyzed claims data to give insight in medical data assets, Lab data and Claims data.

• Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.

• Enhance the existing product with newly features like User roles (Lead, Admin, Developer), ELB, Auto scaling, S3, Cloud Watch, Cloud Trail and RDS-Scheduling.

• Created monitors, alarms, and notiﬁcations for EC2 hosts using Cloud Watch, Cloud trail and SNS.

• Attending daily Standup meetings and discussing about daily progress and con PROJECTS

Visualizing the ecommerce data using BI tools

• Using three diﬀerent public datasets on ecommerce visualized the data to provide business insights.

• Merged the three diﬀerent datasets using certain ﬁelds as unique ﬁeld. Pre- processed the datasets by replacing the null values with average of the ﬁelds, overcoming the outliers, and reducing the data dimensions using R Programming.

• Imported the datasets to Tableau (BI tool) and visualized to answer many business questions. Also, provided useful and meaningful information for the business.

Airline Data Analysis using Apache Hive

• Preprocessed the datasets in HDFS.

• Using Apache hive, (Hive Querying) identiﬁed business insights.

• Documented the insights for the business.

Predicting Heart Disease using Predictive Modeling (R programming)

• Using the public dataset available in UCI repository, predicted the heart disease using the given attributes.

• Pre-processed the dataset by removing the null values, outliers and reducing the data dimensions.

• Partitioned the dataset into training and validation datasets. Implemented logistic algorithm in R Programming to predict the heart disease.

EDUCATION

Masters Degree, Computer Science Jan 2019 - May 2020 Marquette University GPA: 3.82

Bachelors Degree, Computer and Electronics Engineering Jun 2011 - May 2015 KLuniversity

SKILLS

Hadoop: HDFS, MapReduce, Yarn, HIVE, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Spark Databases: MYSQL, SQL Server, MongoDB, Hbase, sql

Programming Languages: Python, SQL, Java, R Programming (Implementing Machine Learning Algorithms), PyS- park, Linux shell scripts

Tools: Eclipse, RStudio, Putty, MS Oﬃce, JENKINS, GIT, BItBucket BI/ETL Tools: Tableau, MS Excel, Microsoft PowerBI, Informatica, etl Machine Learning: Logistic Regression, Naïve Bayes, Decision tree, Random Forest, KNN, Linear Regression, SVM, K-Means, Principal Component analysis, Numpy. Exploratory analysis, Predictive Modeling. Other Technologies: Selenium, HPUFT, Splunk Enterprise, AWS Components (Sage Maker, S3, Ec2, Lambda, Step functions, Elastic Beanstalk, Cloud Pipeline, Cloud watch, Comprehend), aws, splunk, Data Mining,, Network Program- ming, JavaScript, Data Structures

Contact this candidate