email@example.com ● 571-***-****.
Over 5+ years of strong experience in Data Science with newly acquired skills (SQL, Machine Learning, Tableau, Python, R, Excel, and ETL), an insatiable intellectual curiosity, and the ability to mine hidden business insights located within large sets of structured, semi-structured and unstructured data. As a meticulous Data Analyst leads the projects based on extensive and complex data analysis (Teradata and Hadoop, Hive, Map Reduce), data migration, data mining, data loading, report generation and creation of various data Visualizations (MicroStrategy, Qlik, Cognos, Tableau) from large data warehouse systems. Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics. Experienced in Unsupervised methods like K-means, DBSCAN, T-SNE, Spectral Clustering, SOM, LSH and Deep learning models like GANs, ResNets, CNNs, RNNs, and Deep Reinforcement Learning. Experience with developing NLP applications such as sentiment analysis, topic modeling, text summary production and Document Classification using different language models and similarity metrics like Word2Vec, KL-Divergence, and Cosine Similarity.
Expert MS Excel level (power pivot, complex formulas, visualization, pivot tables, VBA macros, VLOOKUP) Extensive experience in Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using d3.js, matplotlib, ggplot, Looker and Tableau. Hands on experience in implementing advanced analytical functions Lead/Lag, Over, Partition By, Dense Rank, N- Tile, Row Number, Windowing Aggregate Function, Inverse Percentile, First/Last Functions. Experienced in Data Integration Validation and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS. Strong Data Modeling experience and designed ER diagrams, 3NF, Dimensional/Hierarchical data models, Star and Snowflake schema data models
Experience in data driven Statistical analysis like Sampling, finding the Data distribution, Hypothesis testing, Correlation among the variable, Outlier Detection, Analysis of Variance and Probability theory Excellent knowledge of Hadoop Ecosystem and Big Data tools as Pig, Hive & Spark. Created PL/SQL packages, Procedures, Functions, Triggers. Hand-on experience with GitHub and Kaggle
Experience in developing UML use case diagrams, Work Flow Diagrams using tools like Rational Rose and MS Visio. With a sound communication skills and pragmatic approach towards problem investigation and deriving solutions always helps in the decision-making for various stakeholders. Skills & Abilities
• Languages: Proficient: MySQL, Python, R, Teradata SQL, Hive, Spark SQL Intermediate: SAS, JAVA
• Databases: Oracle (PL/SQL), MySQL, Microsoft SQL Server, PostgreSQL, Apache Hadoop/ Hive, Teradata
• Tools: Tableau, Looker, d3.js, Google Analytics, Spark, Pig, Hive, SSIS, MicroStrategy, Qlik, Cognos, AWS, Denodo
• Packages: Scikit-learn, pandas, Tensor flow, Matplotlib, Seaborne, ggplot2, plotly, Keras.
• Math & Stats: Logistic Regression, Linear Regression, Time Series Analysis, Hypothesis Testing, ANOVA, Chi- Square Test, Z-Test and T-Test, Predictive Modeling, Numerical Analysis.
• Software’s: Microsoft (MS Word, Excel, Power Point, Visio, Project), MacOSX (Keynote, Number, Pages) Certifications
• Hands -On Tableau Training for Data Science
• EMC2 Certified Academic Associate, Cloud Infrastructure and Service. (Hadoop, Hive, Spark, Cassandra)
• Statistics for Data Science and Business Analysis
• A-Z Machine Learning for Data Science
Data Scientist Verizon Ashburn, VA Mar 18- Present
• Job Description: Responsible for leading the Pricing Analytics Delivery team with the establishment and management of the pricing analytics platform. Establishing relationships with business teams to embrace and adopt new analytics processes.
• Responsibilities: Responsible for predictive analysis of Margin and other features to predict the Discount rate that could be offered for the customer through Sales team or PCM.
• Development and implementation of multiple data science solutions in analyzing sales, revenue impact, cost reduction, discounts and capital allocation in collaboration with multiple multidisciplinary teams.
• Validated assumptions around application data usage, competitive landscape, monthly recurring charge, and vendor revenue share to ensure that Verizon’s financial margins are preserved.
• Worked with Product Marketing to ensure pricing analysis is complete and approved by Finance Executives to meet process milestones.
• Performed sensitivity analyses on each request to understand which factors impact margins both positively and negatively.
• Routinely researched, reconciled, and analyzed data with the purpose of maintaining the hierarchy structure of the Product.
• Analyzed pricing data and vendor rates to find opportunities to increase profitability through cost reduction and revenue enhancement.
• Finished the full life cycle of report development process (design, testing, enhancement, evaluation, and deployment) by fully utilizing SQL server/Teradata systems and Tableau tool for back/front end design.
• Perform customer segmentation to improve marketing strategies and provided suggestions for customer acquisition and retention.
• Presented work progresses for each stage of the report development and demonstrated/explained the functionalities and features on each phase of dashboard reports to end-users.
• Environment: Microsoft Business Intelligence (SQL Server), Python, R, Tableau, Matplotlib, Seaborn Pandas, Keras, TensorFlow.
Data Analyst Instructure, University of Technological Services Bloomington, IN Aug 16-Feb 18
• Job Description: Multi-year project called the Decision Support Initiative (DSI), dedicated to helping IU improve decision making through enhanced data, models, and processes.
• Responsibilities: To capture and organize real-time data for the optimal use and to aggregate student’s data enrolled across Institutions, providing new understanding of the student experience. Insights into student performance to improve retention and graduation rates.
• With Denodo, has successfully combined data sources such as Oracle, MS-SQL, Amazon Redshift, Web services and Box.com, securely serving information to consuming applications such as Tableau and Excel.
• Aided in website optimization by performing A/B Testing of the page variants. Also, built Website Conversion funnel dashboards in Tableau and Excel to track the traffic flow and drop offs in the website.
• Created Visualizations, Dashboards and reports utilizing Tableau, QlikView and interpret trends or patterns that tell compelling story.
• Worked in Agile Environment and responsible for designing analytic frameworks for Data mining, ETL, Analysis, and reporting under the supervision of the Manager.
• Created Views, Sequences, Indexes, Constraints and generated SQL scripts
• Work with Data Architect on Dimensional Model with both Star and Snowflake Schemas utilized.
• Environment: SQL, Python, Tableau, Hive, Spark, Denodo, MicroStrategy, Python. BI Data Analyst Dynamic Solutions India Sep 2011 – Apr 2016 Applied Data Finance
• Job Description: The project was to build an algorithm that accurately classifies credit card holders among multiple classes based on the historical data available on multiple variables. The objective of the project was to improve bank's efficiency by reducing default rate while offering new products.
• Responsibilities: Responsible for predictive analysis of credit scoring to predict whether credit extended to a new or an existing applicant will likely result in profit or losses.
• Organized, cleansed, and processed customers' data for analysis by using Excel, Access, R and Python.
• Implemented advanced analytical functions Lead/Lag, Over, Partition By, Dense Rank, N-Tile, Row Number, Windowing Aggregate Function, Inverse Percentile, First/Last Functions.
• Created reports and dashboards, by using D3.js and Tableau 9.x, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.
• Data for modeling was collected using SQL by querying several tables. The extracted tables were further appended or merged to create tables for modeling using Python.
• Worked with large-scale data ingestion, ETL processing, storage, Hadoop ecosystems, Spark, non-relational databases like NoSQL and MongoDB.
• Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Autocorrelation to verify the models' significance.
• Reviewed Stored Procedures for reports and wrote test queries against the source system (SQL Server).
• Modelled logistic regression classifier and other models using python to provide valuable analytical insights.
• Used k-fold cross-validation to avoid overfitting.
• Measured the quality of the models using Kolmogorov-Smirnov test.
• Environment: SQL, Python, Tableau, Sckit-learn, Matplotlib, Pandas, Seaborn, SSIS, SQL Server, Statistics, AWS, Hadoop
Data Science Projects and Publications
Student Connect Mar 2016 – Sep 2016
• We devised a system and elaborate infrastructure layout for resolving most of the problems which will group students based on various interests which would help university students to communicate.
• Coordinated with ETL team and QA team to improve the quality of the model and reports(analytics) An Adaptive approach of Tamil Character recognition using Deep Learning Jul 2015 -Dec2016
• Conducted Deep learning research to develop a Tamil character recognition system using Convolutional neural network and published at © Springer International publishing Switzerland and presented at CSI-2015.
• Classify characters in Tamil, a south Indian language, using Convolutional Neural Networks (ConvNets).
• Applied Regression, Clustering, and Classification techniques to understand correlation and performance measures.
Improving Restaurants on Yelp by Sentiment Analysis of Reviews Jan 2015 – Jul 2015
• Answered several business questions by using nltk, tensorflow, sentiment analysis and various scikit-learn methods on the Yelp restaurant reviews and attributes.
• Explored the reviewer’s perspectives through extracted phrases and how restaurants can make use of it to improve their businesses.
MEDLINE Analysis Aug 2017 – Dec 2017
• An approach that retrieved and organized MEDLINE citations into different topical groups and prioritized important citations in each group.
• Create a Clustered search on PubMed corpus of medical/bio-medical journals using HPC at IU
• Effectively integrated Text clustering, Text summarization, and Text ranking and organized MEDLINE retrieval results into different topical groups.
• Unsupervised learning – Document clustering, indexing and search, word embeddings, feature extraction. IUB-Microsoft Funded Project Jan 2017 – Dec 2017
• To deliver a web-deployable Handwritten Text Recognition system by means of which repositories of scientific data can render handwritten text on specimen labels and ledgers into digitized information.
• Develop a model to correctly predict digital encoding for each handwritten character in the catalog and specimen label images, with maximum accuracy feasible Technology like Computer Vision & Image Processing. Education
Master of Science Indiana University Bloomington, IN, USA Major: Data Science Dec-2017 Master of Engineering Anna University, TN, India Major: Computer & Communication Mar-2011 Bachelor of Technology Anna University, TN, India Major: Information Technology May-2009