Resume

Bigdata technologies

Location:

Pune, Maharashtra, India

Posted:

November 18, 2019

Contact this candidate

Resume:

SHUBHAM AGRAWAL

adau8m@r.postjobfree.com +91-705******* linkedin.com/in/shubhamagrawal24

Summary:

● Over 2+ years of working experience in various domains like AdTech, Banking, Life science, Pharma & BioTech, etc with Bigdata technologies and half year of Bigdata training in CDAC.

● Currently working in Tapnomy Solutions Pvt. Ltd. on Bigdata operations as a Bigdata Engineer. Handling data in TBs of San Francisco-based Chartboost company on AWS using cutting-edge bigdata technologies.

(From June 19)

● Working Experience in Innoplexus Consulting Services Pvt. Ltd. on Bigdata operations as Member Technical Staff. Built pipeline for data of size in TBs with Spark, Kafka, MongoDB, MySQL, ElasticSearch, Cassandra, Elassandra, Python, etc. Analyze the data with Pandas. (April 18 – May 19)

● Hands-On Experience on Bigdata technologies, Spark, Hadoop Ecosystems, Machine Learning, etc. in CDAC.

(Aug 17 - Feb 18)

● Working Experience in Vyom Labs Pvt. Ltd. as an intern. (Dec 16 – April 17)

● Worked as Chief Technical Advisor in Walchand Linux users’ Group. (2014 – 17)

● Highly efficient in Data cleaning, Data Acquisition, Data validation with large data sets of Structured and Unstructured data.

● Proficient in SQL databases like MySQL.

● Proficient in handling big data on No-SQL databases like MongoDB, Cassandra.

● Expert in using Hive for data analysis with structured and unstructured data.

● Extensive programming skills in analytical programming languages such as Spark-Scala, PySpark, Python, Bash.

● Experience in providing highly efficient, flexible and powerful search capabilities to the users using ElasticSearch.

● Experience on cloud platform like AWS (EC2, EMR, S3, Athena).

● Well versed in Git repository for version control.

● Ability to independently and effectively organise and manage multiple assignments with excellent analytical and problem solving skills.

Technical Skills:

● Programming Lang. : Python 2.7-3.7, Scala 2.12

● Bigdata Technology : Spark 2.4, Hadoop 1.x, Mapreduce, HDFS, Kafka

● Database Technology : Hive 1.1-2.4, MongoDB 3.x, MySQL 5.7-8.x, Cassandra 2.1, Redis, Pig

● Search Engine : ElasticSearch 2.x-6.x

● Hadoop distributions : AWS EMR, Cloudera

● Cloud Services : AWS EMR, AWS S3, AWS Athena, Presto, Qubole

● Workflow scheduler : Airflow 1.10.x

● Crawling : BeautifulSoup, Selenium

● Data Analysis and AI : Pandas, Numpy, ML, DL, NER, NLP, RNN, Tableau

● Others : Livy 0.6, Git, DBeaver 6.2, Superset, Postman, Unix commands Education:

● PG-Diploma in Bigdata from Sunbeam-CDAC Pune with 78.13%. (Aug 17 – Feb 18)

● B.TECH in Information Technology from Walchand College of Engineering, Sangli with 6.7 CPI. (2013 – 17) Achievements:

● Awarded Employee of the month in Innoplexus Consulting Services Pvt. Ltd.

● Got Second rank in CCEE exam (CDAC-BigData)

Work Experience:

● Client: Chartboost

● Project: AdTech data warehouse (June 19 - till date)

● Role: Data Engineer

● Description: Chartboost is the largest mobile games-based ad monetisation platform in the world. The Chartboost SDK is the highest-integrated independent mobile ad SDK and through Chartboost Exchange, Ad Network, and other services. Chartboost developers to build businesses, while connecting advertisers to highly engaged audiences.

● Responsibilities:

Design realtime and batch processing pipelines for different systems to consume.

Make sure all the core data-pipelines working as expected, fixed if there are any issues.

Performed Data Cleaning, Data Validation and Scaling using Airflow, Hive, Spark.

Create airflow data-pipeline to solve performance issues and complexity of daily usage of data.

Collaborate with product team to build new features and infrastructure.

● Environment: Spark 2.4.2, Scala 2.12, Python 2.7, Hive 1.1, Kafka, Airflow, Livy, HDFS, Mongodb, MySQL, Datadog, Git, Jenkin, AWS (S3, EMR, Athena, DynamoDB, BOTO3 library).

● Client: Chartboost

● Project: GDPR (June 19 - Aug 19)

● Role: Data Engineer

● Description: GDPR (General Data Protection Regulation) is a regulation in EU law on data protection and privacy for all individuals within the European Union (EU) and the European Economic Area (EEA). Chartboost also comply with past and up coming data. If any client request to remove its account then Chartboost should erase client information from Chartboost system with given amount of time. In comply with GDPR need to create an automated solution which can look for requested device-ids from entire data warehouse and anonymise clients data.

● Responsibilities:

Single point of contact for GDPR request from data-team.

Design a solution which can handle simple and massive GDPR request.

Develop configurable multithreaded environment to handle any kind of request.

Add monitors in on-going GDPR request process so that will get an alert when process fail or killed.

Create an airflow data-pipeline for daily status automatically.

Collaborate with product team to build new features and infrastructure.

● Environment: Spark 2.4.2, Scala 2.12, Python 2.7, Airflow, AWS, HDFS, MySQL, Datadog, Git, Jenkin.

● Client: Chartboost

● Project: DAG-builder library and DAG Migration from Cloudera to AWS (June 19 - till date)

● Role: Data Engineer

● Description: Chartboost data-pipeline contains 200+ DAGs. Purpose of DAG-builder library is to complete the migration of DAGs from cloudera to AWS-EMR as early as possible. DAG-builder library is to simplify the Airflow DAG creation.

● Responsibilities:

Design and build whole library from scratch.

Migrate whole data-pipeline from Cloudera to AWS-EMR.

Convert all hive jobs to spark jobs.

Optimize the spark-job performance and minimize the cost.

Tuning the spark jobs through the spark code and EMR cluster configurations.

Collaborate with product team to add new functionalities in library.

● Environment: Spark 2.4.2, Scala 2.12, Python 2.7, Hive 1.1, Airflow, Livy, HDFS, Mongodb, MySQL, Datadog, Git, Jenkin, AWS (S3, EMR, Athena, DynamoDB, BOTO3 library).

● Company: Innoplexus

● Project: Life science data warehouse (April 18 - May 19)

● Role: Data Engineer

● Description: Innoplexus provides advanced artificial intelligence (AI) and blockchain solutions that support all stages of drug development from pipeline to market. Innoplexus identifies and extracts structured and unstructured life science data by scanning up to 95% of the world-wide-web and merges it with enterprise and third-party data in an ongoing, real-time process. This continually updated data provides solutions that serve pharmaceutical companies and biotech industry. Data volume is 300+ TB.

● Responsibilities:

Design realtime and batch processing pipeline for different systems to consume.

Make sure data-pipeline is working as expected, fixed if there are any issues.

Performed Data Cleaning, Data Validation and Scaling using Spark, Pandas.

Collaborate with product team to build new features and infrastructure.

● Environment: ElasticSearch 2.x-6.x, Mongodb 3.x, Cassandra, PySpark, Kafka, Python 3, Redis, Pandas, Git.

● Client: Innoplexus

● Project: Company360 (July 18 - Sept 19)

● Role: Data+backend Engineer

● Description: Company360 is a Web Portal for banking sector which provides information of different organizations and their competitors. Information like products, services, industry, sector, About_us, Sentimental analysis of company news, Employee details, Boards in company, financial data, announcements, litigations, etc.

● Responsibilities:

Worked independently on whole ETL and backend operations.

Single point of contact for the whole backend APIs and data.

● Environment: Python 2.7, NodeJS, Pandas, MySQL, ElasticSearch, MongoDB, Redis, Git, ML algorithms like Rainforest, LSTM, NLTK, NER.

● Client: Innoplexus

● Project: Legalplexus (Dec 18 - May 19)

● Role: Data+backend Engineer

● Description: Legalplexus provides platform where user can upload documents like pdf, docx, text images, scanned documents. It provides facility of scanned_documents_reader and scanned_documents_editor.

● Responsibilities:

Worked independently on backend operations and data load operations.

Single point of contact for the whole backend APIs and data.

Design and build efficient data-pipeline.

● Environment: NodeJS, ElasticSearch, MongoDB, Redis, Git, Image processing. Self-Projects:

● Stock Market Prediction Model using Machine Learning:(June 16 - Dec 16) This project focuses mainly on prediction of trend in stock market of organization. Also finding which model is better for prediction of stock market price of particular organization.

[github.com/shubhamagrawal24/best_stock_price_prediction_model]

● Solving Standard Equations with Evolutionary Algorithm (Genetic Algorithm):(Dec 15 - March 16) Using Genetic algorithm solve the standard equations like Diophantine equation.

[github.com/shubhamagrawal24/solving_equations_using_genetic_algorithm]

● BrainRush:(Aug 15 - Dec 15)

A PC game build in JAVA in which player has to match type of multi-coloured object (circle/ square) coming from all four sides by clicking the mouse.

[github.com/shubhamagrawal24/brain_rush_game]

Contact this candidate