Data Python

Location:

Hyderabad, Telangana, India

Posted:

February 07, 2020

Contact this candidate

Resume:

Sheeba Ogirala

******.******@*******.***

469-***-****

SUMMARY:

Data Scientist/Data Analyst with around 6 years of Experience, of which 3 years are in Data Science and Analytics including Data Mining, Statistical Analysis with domain knowledge in Retail, Healthcare and Banking industries.

Involved in Data Science project life cycle, including Data Cleaning, Data extraction, Visualization, with large data sets of structured and unstructured data, created ER diagrams and schema.

Experience with Machine Learning algorithms such as logistic regression, KNN, SVM, random forest, neural network, linear regression, lasso regression and k-means.

Good experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions, to various business problems and generating data visualizations using R, Python and Tableau.

Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, R 3.0 (ggplot2, dplyr, Caret) and Excel

Experienced the full software lifecycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.

Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA

Working Experience on Python 3.5/2.7 such as NumPy, SQLAlchemy, Beautiful soup, pickle, Pyside, Pymongo, SciPy, PyTables.

Ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, NoSQL databases like MongoDB 3.2

Experience in Big Data technologies like Spark 1.6, Spark SQL, PySpark, Hadoop 2.X, HDFS, Hive 1.X.

Experience in Data Warehousing including Data Modeling, Data Architecture, Data Integration (ETL/ELT) and Business Intelligence.

Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based speech recognition using TensorFlow.

Good Experience in using various Python libraries (Beautiful Soup, NumPy, Scipy, matplotlib, python-twitter, Pandas, MySQL dB for database connectivity).

Having experienced in Big Data technologies including Apache Spark, HDFS, Hive, MongoDB.

Used the version control tools like Git2.X and build tools like Apache Maven/Ant.

Worked on Machine Learning algorithms like Classification and Regression with KNN Model, Decision Tree Model, Naïve Bayes Model, Logistic Regression, SVM Model and Latent Factor Model.

Experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

Good knowledge on Microsoft Azure.

Knowledge and understanding of Devops(Dockers).

Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL database.

Extensive experience in Data visualization tools like, Tableau 9.X, 10.X for creating dashboards.

Experience in development and designing of ETL methodology for supporting data transformations and processing in a corporate-wide environment using Teradata, Mainframes, and UNIX Shell Scripting

Used SQL Queries and Stored Procedures extensively in retrieving the contents from MySQL.

Good in implementing SQL tuning techniques such as Join Indexes (JI), Aggregate Join Indexes (AJI's), Statistics and Table changes including Index.

SQL loader for direct and parallel load of data from raw file to database tables.

Experience in development of T-SQL, OLAP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.

Strong SQL Server programming skills, with experience in working with functions, packages and triggers.

Good industry knowledge, analytical &problem solving skills and ability to work well with in a team as well as an individual.

Great team player and ability to work collaboratively and independently as required.

SKILLS:

Languages

C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R, Python 2.x/3.x, Java, C, SQL, Shell Scripting

NO SQL Databases

Cassandra, HBase, MongoDB, Maria DB

Statistics

Hypothetical Testing, ANOVA, Confidence Intervals, Bayes Law, MLE, Fish Information, Principal Component Analysis (PCA), Cross-Validation, correlation.

BI Tools

Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse

Algorithms

Logistic regression, random forest, XG Boost, KNN, SVM, neural network rk, linear regression, lasso regression, k-means.

Big Data

Hadoop, HDFS, HIVE, PuTTy, Spark, Scala, Sqoop

Reporting Tools

MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

Database Design Tools and Data Modeling

MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball & Inmon Methodologies

EDUCATION:

MS in Computational and Applied Mathematics, 2019, GPA 3.2

Harrisburg University of Science and Technology

MBA in Information Systems & Finance, 2016, GPA 3.7, FMA National Honor Society Fordham University, Gabelli School of Business

B. Tech, Electronics & Communication Engineering, 2010, GPA 3.8 Jawaharlal Nehru Technological University

WORK EXPERIENCE:

Tower Loan, Flowood, MS Oct 2018-Till Now

Data Scientist

Responsibilities:

Involved in Data Profiling to learn about user behavior and merge data from multiple data sources.

Participated in big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, Hive, and HBase.

Designed the prototype of the Data Mart and documented possible outcome from it for end-user

Worked as Analyst to generate Data Models using Erwin and developed a relational database system.

Designing and developing various machine learning frameworks using Python, R and MATLAB.

Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends

Participated in all phases of data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.

Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.

Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS

Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.

Collect unstructured data from MongoDB 3.3 and completed data aggregation.

Conducted analysis of assessing customer consuming behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.

Participate in features engineering such as feature intersection generating, feature normalize and Label encoding with Scikit-learn preprocessing.

Used pandas, NumPy, Seaborn, Scipy, Matplotlib, SKLearn and NLTK (Natural Language Toolkit), in Python for developing various machine learning algorithms

Utilized machine learning algorithms such as Decision Tree, linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN.

Parsing data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.

Determine customer satisfaction and help enhance customer experience using NLP.

Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data

Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0

Worked on different data formats such as JSON, XML and performed machine learning algorithms in R

Worked on MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop.

Perform data visualizations with Tableau 10 and generated dashboards to present the findings.

Work on Text Analytics, Naïve Bayes, Sentiment analysis, creating word clouds, and retrieving data from Twitter and other social networking platforms

Use Git2.6 to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.

Environment: Python 3.2/2.7, hive, Tableau, R, QlikView, MySQL, MS SQL Server 2008/2012, AWS, S3, EC2, Linux, Jupiter Notebook, RNN, ANN, Spark, Hadoop.

Wireless Telecom Group, Parsippany, NJ Nov 2016- Sep 2018

Data Scientist

Responsibilities:

Communicated and coordinated with other departments to gather business requirements.

Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.

Participated in the installation of SAS/EBI on Linux platform.

worked on Data Modeling tools Erwin Data Modeler to design the data models.

Designed tables and implemented the naming conventions for Logical and Physical Data Models in Erwin 7.0

Worked on development of data warehouse, data Lake and ETL systems using relational and non-relational tools like SQL, No SQL.

Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS, and PL/SQL.

Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database

Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data

Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS

Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions.

Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, business Objects.

Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.

Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.

Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.

Used Python (NumPy, Scipy, Pandas, Scikit-Learn, Seaborn), and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.

Utilized spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.

Implemented, tuned, and tested the model on AWS EC2 to get the best algorithm and parameters.

Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.

Designed and developed machine learning models in Apache - Spark (MLlib).

Used NLTK in Python for developing various machine learning algorithms.

Implemented deep learning algorithms such as Artificial Neural network (ANN) and Recurrent Neural Network (RNN), tuned hyper-parameter and improved models with Python packages TensorFlow.

Installed and used Caffe Deep Learning Framework.

Modified selected machine learning models with real-time data in in Spark (PySpark).

Worked with architect to improve cloud Hadoop architecture as needed for Research.

Worked on different formats such as JSON, XML and performed machine learning algorithms in Python.

Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

Worked very close with Data Architects and DBA team to implement data model changes in the database in all environments.

Used Pandas library for statistical Analysis.

Communicated the results with operations team for taking best decisions.

Collected data needs and requirements by Interacting with the other departments.

Environment: Python 3.2/2.7, hive, oozie, Tableau, Informatica 9.0, HTML5, CSS, XML, MySQL, MS SQL Server 2008/2012, JavaScript, AWS, S3, EC2, Linux, Jupyter Notebook, RNN, ANN, Spark, Hadoop.

Tata Consultancy Services (TCS), Hyderabad, India 2011-2013

Systems Engineer

Responsibilities:

Experienced in all phases of software development life cycle for Fiat Chrysler Mainframe-SAP integration project with emphasis on design, build, test and implementation.

Developed four new programs in COBOL using DB2, IMS DB & IMS DC in DCCS. Created JCL to run programs that extract data from IMS DB, create, format and send daily report to Fiat’s payment system in SAP via FTP

Created and activated PSBs and PCBs to run corresponding programs

Debugged programs, sub-programs and procedures extensively using TEST option and documented test results thoroughly

Performed unit testing and system testing for programs executed through jobs, screens and webpages

Demonstrated a quick learning curve – saved 30% of effort hours in design and coding phases

Analyzed issues in programs during system testing and suggested ways to address them

Discussed business requirements with onsite managers to assess feasibility of technical solutions

Contact this candidate