Post Job Free

Resume

Sign in

Data Engineer

Location:
Boston, MA
Posted:
October 23, 2017

Contact this candidate

Resume:

Arvind Venkatasubramanian

** ***** ******* **. *** #1 Boston MA 02115 617-***-**** ac2wme@r.postjobfree.com GitHub: https://github.com/arvindv17 LinkedIn: https://www.linkedin.com/in/arvindsubra/ Education

Northeastern University, Boston, MA

Master of Science in Information Systems

July 2017

SRM University, Chennai, India

Bachelor of Technology in Electronics and Instrumentation May 2012

Skills

Programming Paradigms: Python, R, Java, Spring MVC, Shell Script, Linux, Unix, REST API DBMS Software/Tools: SQL Server, Oracle PL/SQL, Toad, MySQL, Postgres Big Data Ecosystems: Hadoop, MapReduce, HBase, MongoDB, Cassandra, Spark, Hive, Pig, Kafka Cloud Platforms: Google Cloud, Amazon Web Services (AWS), Microsoft Azure, Docker Data Integration and Business Intelligence Tools: SSIS, Talend, Power BI, Tableau, QlikSense, QlikView Management Tools/Repositories: Jira, HP QC, GitHub, SVN, Bitbucket Academic Projects

Fly High – Cheapest Air Travel Dates Predictor MapReduce, AWS, R, HBase, Apache Phoenix

• Utilized MapReduce code after data cleaning prices using R to identify date of price fall of airline tickets; Loaded it to an HBase database, integrated Apache Phoenix and hosted it on AWS server with an interactive UI. Python AWS application using Flask Python, Flask, AWS, PostgreSQL, Heroku

• Created a REST API using python to take information from a interactive UI, and store the information in a postgres database instance on the AWS. Also deployed the application on Heroku Contoso Data Warehousing and Visualization Talend, SSIS, Tableau, Qlik View, Power BI

• Performed Extract, Transform and Load(ETL) operation on Contoso data(~8Gb) from multiple sources (Database, csv files) to a target SQL server database; optimized performance to less than 13 minutes. PageRank algorithm of Wikispecies and Mahout Recommender Hadoop, Mahout, Fully Distributed HDFS, AWS

• Implemented the PageRank algorithm on wikispecies data by writing MapReduce code; utilized the fully distributed mode to run the system over a network; implemented Mahout recommender on a large dataset; also, used AWS. Search Engine Optimizer Python, BeautifulSoup, NLTK

• Created a search engine optimizer using python for a universal URL; identified key relevant terms. Lucene Document Keyword Search Lucene, HTML, CSS, Inverted Index, Elasticsearch

• Created a web based application to search for keywords from a directory of documents using Lucene simulating Python Projects Python, AWS, Heroku

• Created real time applications using python by using packages like OpenCV, pandas, numpy, flask. Folium etc. Big Data Projects Python, MapReduce, Hive, Pig, Spark

• Created a repository using the grouplens dataset to utilize the different programming paradigms of Big data applications. Solved simple business questions based on the dataset. Elastic Computing Network Simulation Linked Lists, Queues, Java, Multi-Threading

• Using concepts of Queues, Linked Lists, implemented an elastic computing application to handle requests coming in and getting processed in sequence of time.

Experience

Fidelity Investments - Web and Digital Analytics Co-op January 2017 – June 2017

• Implemented an anomaly detection algorithm to identify known anomalies and predict new anomalies; automated the system to receive email alerts and triggers based on any changes in the data.

• Integrated third party applications with the existing data feeds coming from the Hadoop system; identified best tools to be used.

• Implemented Adobe Site catalyst developer APIs requests to retrieve bulk data for analytics. Accenture Services Private Limited – Software Engineering Analyst April 2012 – July 2015

• Led a team of 4 to test defects in billing and revenue management module of integration test environment; detected major design gaps; received award from the project management.

• Created multiple SQL queries to increase data retrieval speed and performance; optimized performance by more than 70%.

• Created Unix and shell scripts for generation of test data for billing application; reduced 80% of manual effort; received award from client team

• Designed and created multiple python scripts for simulation of real time test data; was used in the integration test and pre-production Business User Acceptance Test to ensure correct results.



Contact this candidate