Post Job Free
Sign in

Data Engineering and Cloud Intern

Location:
White Plains, NY
Posted:
March 02, 2020

Contact this candidate

Resume:

Arushi Sharma

Boston, MA 914-***-**** ******.***@*****.***.***

https://github.com/Arushi04 https://www.linkedin.com/in/arushi04/ SUMMARY

* ***** ** ******** ********** in big data, cloud computing and creating data engineering pipelines in various domains. Areas of interest: Data Science, Arti cial Intelligence, Cloud Computing

EDUCATION

• Masters of Science (Data Science); Northeastern University, Boston, MA May;2021 Relevant Courses: Algorithms, Linear Algebra and Probability, Information Retrieval, Data Management

• Bachelor of Technology (Computer Science); Rajasthan Technical University, India May;2012 Relevant Courses: Data mining and Warehousing, Distributed Systems, Statistics and Probability Theory

TECHNICAL SKILLS

• Programming Skills: Python, Shell, SQL, R, Terraform, MATLAB, Numpy, Pandas

• Technologies: Big Data, Hadoop, Spark, Kafka, Hive, Cloud (AWS), Docker, Elastic Search

PROJECTS

Northeastern University, Boston Jan 2020 - cont.

Information Retrieval (Python, Elastic Search, Kibana, Lucene, Docker)

• Working on indexing and ranking search results on huge amount of data to improve the query search optimization using vector and language models

PROFESSIONAL EXPERIENCE

Northeastern University, Boston Jan 2020 - cont.

Teaching Assistant

Large Scale Parallel Data Processing (Hadoop, Spark)

• Assisting students to understand the big data processing framework and analysis techniques to solve the related challenges and helping professor in grading the assignments.

Bridgei2i Analytics Solutions Ltd.,India April 2017 - May 2019 Senior Analytics Consultant

Bridge Funnel (Docker, Terraform, AWS, Jenkins)

• Created the cloud infrastructure for company’s AI sales product BridgeFunnel and automated the infrastructure setup using Terraform to reduce human error and speed up production process.

• Containerized algorithms using Docker and con gured Jenkins for continuous builds. Image Analytics (Docker, AWS - S3, ECR, Batch)

• Created a computer vision platform using AWS services to accommodate image and video analysis in a large scale setting focusing on cost e ciency, scalability and minimum human intervention.

• Modified the image analytics algorithm to integrate it with AWS services and containerized the algorithm to migrate it to cloud

Motion Triggered Object Detection and Classi cation (Kafka, Spark Streaming, Cassandra, Tensor

ow)

• Implemented an end to end use case of object detection and classi cation sourced from live feed of a webcam by creating data pipelines involving Kafka servers and Spark Streaming using pre-trained models of tensor

ow and darknet for inference.

Anomaly Detection (Spark Streaming, Cassandra, Kafka)

• Designed and developed the data pipeline for anomaly detection by collecting data from various sensors to keep track of any anomaly that might occur.

• Processed the collected data using Spark streaming and stored that data in Cassandra from where it was fetched for real time monitoring on dashboard.

Lead Engine (Intent Marketing) (Pyspark, AWS, DataBricks)

• Restructured and optimized the code in PySpark to improve the performance of Lead Engine by lowering the real-time execution by almost 50.

• Designed a level control framework in the application such that di erent levels inside the application can be controlled from a single program.

Accenture Solutions Pvt. Ltd., India Feb 2016 - April 2017 Application Development Analyst

Data Architecture and Warehousing(Hortonworks, Diyotta, HDFS, Hive, Hue, Sqoop, Flume)

• Developed a data ingestion pipeline to ingest historical and incremental data from multiple source using Diyotta and created a data lake to enable data analysis.

• Processed the ingested data with missing and unformatted values using hive queries.

IBM India Pvt. Ltd., India Jan 2013 - Feb 2016

Software Developer

DCR Analysis (Cloudera, Hive, Sqoop, Oozie, Pig)

•• Developed and optimized hive scripts for insights from contact center data to improve customer experience.

• Optimized query performances using hive partitioning and bucketing.

ACHIEVEMENTS

•• Certi cate of Recognition, Bridgei2i Analytics Solutions Ltd 2018 For mentoring analytics professionals in Big data, Pyspark and Cloud

• Managers Choice Award, IBM India Pvt Ltd 2015

For excelling in the practice of \Put the Client First"

• Orion Award Winner, IBM India Pvt Ltd 2014

For exemplary performance and exceptional dedication to IBM.

• Outstanding Contributor Award, IBM India Pvt Ltd 2014 Excellent performance for the year 2013-2014 at IBM



Contact this candidate