Name : Vishnu kumar
Phone Number: +1-908-***-****
Email : ad4fvm@r.postjobfree.com
PROFESSIONAL SUMMARY
Dynamic and motivated professional with 7+years of experience in the IT industry working on multiple technologies including Data Engineering, Product Management, cloud engineering, etc.
Strong experience in leading teams, mentoring junior engineers and delegating tasks
Strong academic background in technology and management/business
Highly proficient in cloud platforms – Amazon Web Services (AWS), Google cloud Platform (GCP)
Strong experience with programming languages like Python, JavaScript (node.js, react.js), etc.
Deep working knowledge of multiple Python libraries and frameworks like pandas, numpy, sklearn
Strong understanding of Data Engineering tools – Databricks, Airflow, Snowflake, etc.
Strong understanding of APIs, RESTful services, API development and usage
Strong understanding of OLTP, OLAP, data governance and data modelling, data warehousing
Proficient and experienced in various agile methodologies like Scrum, Kanban, Waterfall, etc.
Strong in Statistical Modeling and Machine Learning techniques
Demonstrated history of cross-functional collaboration across various disciplines in an organization
Strong ability to easily translate business requirements to technical workflows and tasks
Proficient in documentation, and presenting technical details to non-technical audience
Fast learner with a divergent/creative thinking, able to come up with unique ideas and solutions
Experienced in automating tasks, saving immense resources to organizations through innovation
Strong interest in data engineering, Artificial Intelligence, NLP and LLM (Large Language Models)
CERTIFICATIONS
Scrum.org – Professional Scrum Master – II
Meta – Introduction to Databases
IBM – Design Thinking Practitioner
EDUCATION
M.S. in Management Science
Columbia University in the City of New York
Columbia Business School (M7 business school)
Activities – Officer at Blockchain@CU, Columbia University Financial Engineering Club, Board member at Columbia University Product Management club
G.P.A – 3.41/4.00
B.Eng. in IT
B.M.S College of Engineering
Activities – NSS, BMSCE Investment Club, BMSCE Mathematics Club, Phase Shift Department coordinator
Thesis project – “Real-time conversion of text-to-braille using Python, Machine Learning, Optical Character Recognition (OCR) and Google Cloud Platform (GCP)”
Received the best project award at the university level (among 100+ projects)
G.P.A – 8.45/10.00
TECHNICAL SKILLS
Category
Skills
Programming Languages
Python (pandas, numpy, sklearn), JavaScript, Go, Bash, SQL
Databases
MySQL, PostgreSQL, Oracle, MongoDB, AWS DynamoDB, Redis
AWS
Amazon S3, EC2, Redshift, Glue, Glue Studio, Amazon RDS, Aurora, Athena, AWS Lambda, AWS API Gateway, Elasticache, Amazon SNS, Amazon CloudWatch, Amazon SQS, AWS EMR, AWS Sagemaker
Big Data Technologies
Apache Spark, PySpark, Apache Hive, Databricks, PySpark
Data Visualization
Tableau, Microsoft PowerBI
Integration Tools
Mulesoft
Project Management
JIRA, Confluence, Asana, Monday.com
Workflow Automation
Apache Airflow, Databricks Jobs
Containerization
Docker
CI/CD
Jenkins
Front-end Development
HTML5, CSS3, react.js, node.js
Google Cloud Platform
Google Cloud Storage, Google Compute Engine, Google BigQuery, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Functions, Google Cloud Endpoints, Google Cloud Pub/Sub, Google Cloud Monitoring, Google Cloud Pub/Sub
Snowflake
Snowflake, Snowpipe, Snowpark, SnowSQL, Snowsight
RELEVANT EXPERIENCE
Fox Sports–New York, NY Apr 2023 – Present
Senior Data Engineer (FOX Sports AI Platform)
Highlighted Projects
1.Fox Data Products – Data Platform (Medallion Architecture)
Designed and implemented scalable ETL pipelines using AWS S3, AWS Lambda, AWS DynamoDB, Databricks (PySpark), AWS CloudWatch, etc.
Utilized AWS S3 as a common Datalake and Data Warehousing solution (silver and bronze)
Developed a data ingestion pipeline to ingest data from multiple vendors into the AWS S3
Worked with data flow APIs from multiple vendors for data like SR360, SportRadar, etc.
Implemented a medallion architecture to process and deliver validated event-by-event data
Utilized Python based convertors that leverage Pydantic models to convert Bronze level data
Flattened tables and created views using SQL for ease of understanding and visualization
Worked with data from multiple sports and leagues like MLB, NCAAFB, NFL, etc.
Managed a team of 3 junior data engineers and delegated pipeline development
Worked closely with business analysts to understand and translate business requirements
Persisted silver tables onto S3 and ingested tables into DynamoDB through Databricks DLT
Developed and maintained robust, scalable data pipelines using Apache Airflow to orchestrate complex workflows across AWS services such as AWS Lambda, AWS Glue, and Amazon Redshift, ensuring efficient data extraction, transformation, and loading (ETL)
Worked on Live ETL Pipelines by leveraging Apache Kafka/AWS Kinesis
Worked extensively with Jupyter/Ipython style notebooks on databricks for building pipelines
Utilized Git within the Databricks platform for version-control and collaboration
Extensively worked with PySpark on the Databricks to perform Dataframeoperations
2.Fox Data Platform – Cold Ranking
Devised a mechanism for cold ranking various events, metric and stats for MLB, NFL, etc.
Ranked career, season-level data by using business logic based on opportunity categories
Identified metrics for filtering based on threshold and stored them relevant Python classes
Leveraged Object Oriented programming to use functions in Python to create PySpark UDF’s
Created a cold ranking mechanism in PySpark, performed ranking in Pyspark
Leveraged Apache Airflow to automate and schedule ETL tasks - validation, processing.
Compressed and synced data to AWS DynamoDB using a PySpark UDF
Extensively relied on SQL/PySpark SQL for all tasks related to ETL to arrive at final table
3.Fox Sports AI Platform – Generative AI Event Commentary (Speech/Voice AI)
Worked on a POC project highlighting the capabilities of Generative AI in the sports space
Leveraged the capabilities of GPT and other functional models on AWS Bedrock
Created dynamic, real-time commentary across various sports events (simulated NFL games)
Created a tool that generates Event-By-Event Description of sports play data for NFL
Exposure to Voice generative AI for Sports Data commentary
Tech Stack: Python (incl. Pandas, Numpy, sklear, matplotlob, etc), SQL, Databricks, PySpark, Apache Airflow, Apache spark, AWS S3, AWS EC2, AWS DynamoDB, AWS ElastiCache, Redis, AWS Lambda, AWS API Gateway, AWS Bedrock, Apache Kafka, AWS Kinesis, AWS CloudWatch, JIRA, Confluence, DBeaver
Deloitte – New York, NY Apr 2021 – Apr 2023
Client: The Walt Disney Company
Role: Sr Data Engineer
Highlighted Projects
1.Disney Experiences – Disneyland and Disneyworld Cancellations and Reservations
Integrated diverse data sources such AWS RDS and DynamoDBinto a unified data ecosystem
Deployed Python-based data processing andworkflows on AWS Lambda and EC2 instances
Developed and maintained ELT/ETL pipelines using Apache Airflow
Ensured timely and error-free data ingestion from various sources into Snowflake
Leveraged data warehousing capabilities of Snowflake - to perform complex analytics
Worked with historical as well as live reservations, cancellations and guest experience data
Worked closely with the business intelligence and customer experience teams to translate data insights into actionable strategies
Staged extracted data in AWS S3 which functioned as a global data lake
Leveraged AWS SDK for Python (Boto3) to interact with AWS services
Utilized Snowpark to develop and execute complex data transformation and processing
2.Disneyland – FastPass
Used Python, directly within the Snowflake environment to transform FastPassdata
UtilizedSnowpipe for continuous, near-real-time data ingestion from AWS S3
Leveraged Snowflake SQL for executing SQL queries to efficiently, query, and transform
Employed Snowsightfor advanced data visualization and dashboarding
Transformed data using a combination of Snowflake SQL and Snowpark
3.Disney Parks – CCPA, GDPR compliance and Global reference table (GENIE)
Extensively worked withSQL on Snowflake to ensure GDPR and CCPA compliance
Engineered a pipeline to gather data from multiple sources to create a global metadata reference table that housed all unique ID’s across Disney’s business –GenieID, DisneyID, etc.
Utilized Snowflake’s Time Travel capability for analyzing historical data
Utilized the COPY command for bulk data load from S3 to Snowflake
Tech Stack: Python (incl. Pandas, Numpy, sklear, matplotlob, etc), SQL, Apache Airflow, Apache spark, AWS S3, AWS EC2, AWS DynamoDB, AWS ElastiCache, Redis, AWS Lambda, AWS RDS, AWS CloudWatch, JIRA, Confluence, Snowflake, Snowpark, SnowPipe, Snowsight, boto3, dbt
Client: Amazon
Role: AI Data Engineer
Highlighted Projects
1.CONFIDENTIAL – Generative AI Project
Devised a POC roadmap for creating a in house generative AI for product descriptions
Collected and prepared Prompt-Completion pairs in multiple styles across broad topics
Closely revised data and handed over the initial stage development of generative AI
Tech Stack: Python, AWS S3, AWS EC2, AWS EMR,Microsoft Excel, Microsoft PowerPoint, Microsoft Word, JIRA, Confluence, etc.
Client: Lumen
Role: Sr Data Engineer
Highlighted Projects
1.Churn Prediction – Using CDRs (Call Data Records)
Worked extensively with Lambda and AWS API Gateway to build a microservices architecture
Extensively leveraged Jupyter/IPython notebooks for ETL scripts in Python/PySpark
Utilized AWS CloudWatch for scheduling end-to-end ETL pipelines and monitoring.
Created pipelines to extract data from on-premises source systems to AWS S3
Utilized Amazon S3 to trigger Functions and microservices written in Python
Hands-on experience with PySpark, using Spark libraries and building UDF’s for tasks
Extensive experience with Databricks notebooks for PySparkdataframe manipulations
Worked extensively on ETL Pipelines to consume the data from APIs using Python, transform data, load it into AWS S3 for Reporting and Analysis purposes by Data Scientists
Managed codebase versions and changes effectively using Git
Created workflows for a job and scheduled AWS Glue jobs
Leveraged python boto3 to configure AWS services including Glue, Redshift, EC2, S3, etc.
Used AWS Glue to catalog data stored in S3 for integration with AWS Redshift Spectrum
Engineered crawlers within AWS Glue to store data in the AWS glue metadata catalog
Triggered AWS Glue ETL jobs using AWS Lambda functions (event-driven ETL pipelines)
Loaded processed data onto AWS Redshift to function as a OLAP database
Tech Stack: Python, AWS S3, AWS EC2, AWS Glue, AWS EMR, Pyspark, Databricks, Jupyter, AWS Lambda, AWS API Gateway, AWS Redshift, Tableau, Git, AWS CloudWatch, JIRA, Confluence, etc.
Client: Capital One
Role: Data Engineer
Highlighted Projects
1.Credit Cards – default handling and spend-pattern (at-risk accounts)
Worked extensively with AWS Lambda, built functions in Python
Designed efficient SQL queries to extract data from relational databases, Amazon RDS
Designed an ETL pipeline incorporating Amazon SNS, KinesisFirehose for live transactions
Engineered Amazon Kinesis Firehose to subscribe to Kinesis for batch processing of live data
Engineered Amazon Redshift to function as a data warehousing solution
Stored output of Amazon Kinesis Firehose stream in Amazon S3
Reprocessed data stored in S3 using Lambda functions and synced the data into DynamoDB
Created multiple AWS Glue scripts for validating/processing source systems
Scheduled AWS Glue jobs to be executed on job clusters for data transformation tasks
Utilized AWS Step Functions for orchestration tasks such as Lookups, if conditions, foreach loops, setting and appending variables, getting metadata, filtering, and wait operations.
Designed crawlers in AWS Glue to infer schema and create tables in the Glue Data Catalog
Structured PySpark scripts generated by AWS Glue for performing serverless ETL.
Wrote queries on Amazon Athena to extract data from Amazon RDS and create files in S3
Extensively relied on PySpark and SparkSQL to perform analysis on large volumes of data
Leveraged python boto3 to configure multiple AWS services including Glue, Redshift, EC2, S3
Experienced in using Git-based tools like GitHub, GitLabforcollaboration and version control
Designed and implemented automated CI/CD pipelines using Jenkins, Terraform, and other DevOps tools, enabling efficient and reliable software delivery and deployment.
Implemented real-time data streaming solutions using Amazon Kinesis, processing high volumes of data with low latency to provide near-real-time insights and analytics.
Worked with AWS Glue Studio and Jupyter Notebooks hosted on Amazon Sagemaker to create and schedule data engineering and data science workflows using Python and Scala
Tech Stack: Python, Go, AWS RDS, Amazon Aurora,SparkSQL, AWS Glue Studio, AWS S3, AWS EC2, AWS Glue, AWS EMR, PySpark, Databricks,Jenkins, Amazon SageMaker,Terraform, Docker, Kafka, AWS Kinesis,Jupyter, AWS Lambda, AWS API Gateway, AWS Redshift, Tableau, Git, AWS CloudWatch, JIRA, Confluence, etc.
ISERP (Columbia University) - New York, NY Sep 2020 – Feb 2021
Data Engineer
Highlighted Projects
1.AI Model Share platform – Web based platform for executing Machine Learningmodels
Created a Python based data ETL pipeline to extract semi-structured/structured data
Extracted data from sources like AWS DynamoDB, AWS Elasticache Redis,REST APIs
Cleaned and transformed data using lambda functions written in Python
Designed automated archiving for Redis using Lambda, CloudWatch cron jobs and S3 buckets
Designed and implemented high-performance caching solutions using AWS Elasticache Redis, enabling efficient and low-latency data retrieval across microservices
Configured and optimized Redis clusters and nodes using AWS ElastiCache Redis, enabling efficient and performant data storage and retrieval, and reducing operational costs
Leveraged the power of Python and the boto3 library to write efficient and scalable Lambda functions, enabling seamless integration with various AWS services
Designed and implemented an efficient and scalable data model using DynamoDB to store data related to ML models and their usage, enabling efficient querying and retrieval of data
Designed and maintained APIs that execute Machine Learning models on-demand
Tech Stack:Python, AWS DynamoDB, AWS Elasticache, Redis, REST APIs, AWS CloudWatch, AWS Lambda, Machine Learning, cron, microservices, boto3
United Nations (UNDP) - New York, NY Sep 2019 – Sep 2020
Data Engineer
Highlighted Projects
1.Sentiment Analysis to predict violent events in sub-Saharan Africa (Sahel)
Supervised a team of 8 to engineer a Python script to scrape news data for multiple countries in the Sahel region across 20+ years by leveraging Selenium, requests, and beautifulsoup
Engineered a Python script on Google VM’s to scrape news data (headlines and text)from the web for 20+ years by utilizing Selenium, beautiful soup and requests
Authored cron jobs to perform periodical data extraction from various sources
Built and end-to-end ETL pipeline using Python and the Google Cloud Platform (GCP) to gather data from multiple sources and prepared it for Natural Language Processing (NLP)
Utilized IPython notebooks on Google collab and leveraged Python, pandas, numpy, matplotib to perform Exploratory Data Analysis (EDA) on scraped and cleaned data
Stored scraped data in the form of Parquet files on GCS (Google Cloud Storage)
Leveraged Google Cloud Platform (GCP) for hosting continuous data gathering scripts on virtual machines (VMs) /cloud compute resources
Utilized Pyspark on Google Cloud Dataproc for performing sentiment analysis on large volumes of data and generating insights from parquet files
Utilized Tableau for creating dashboards to visualize correlations and understand relationships between parameters
Structured Google Storage to function as a storage solution for all obtained file formats
Stored insights and sentiment analysis data as CSV files on GCS
Created multiple Databricks Pyspark notebooks for validating/ processing source systems data into warehouse
Designed and developed tables on BigQuery, loaded CSV insights data into BigQuery
Implemented a data warehousing solution using Google BigQuery for analytical purposes
Plugged in Tableau onto BigQuery to plot correlations and other statistical data for stakeholders
Designed vibrant and informative dashboards on Tableau providing a descriptive snapshot of metrics and data
Estimated sentiments using NLP through pandas, NumPy, and NLTK (Vader Sentiment)
Utilized multiple operators in Apache Airflow to author DAGs to orchestrate the entire data pipeline/ETL flow
Utilized Google Cloud Dataproc for Pyspark based analysis of Parquet files
Tech Stack:Python, GCP VMs, Selenium, BeautifulSoup, NLP, IPython, Google Collab,numpy, matplotlib, Google Cloud Storage, Pyspark, DataProc, Tableau, BigQuery, Apache Airflow, Parquet, cron
Great West Financial - Bangalore, IN May 2017 – Sep 2019
Data Engineer
Highlighted Projects
1.Non-Qual US Retirement Plans
Architected a data warehouse solution on Redshift to store transformed data
Leveraged PySpark/Apache Spark to optimize extraction and transformation of data
Extensively worked on PySpark to analyze, transform and manipulate large datasets
Worked with PySpark SQL functions and wrote multiple UDF’s to perform column operations
Designed ELT and ETL workflows using Mulesoft to for data to be loaded onto Redshift
Created an AWS Lambda pipeline to migrate microservices from Mulesoft API Gateway to AWS API Gateway
Worked on migration of data from On-Prem SQL server to Cloud databases
Developed cron jobs to run BASH/ to automate servermaintenance, and health checks
Structured node.js functions on AWS Lambda to provide API proxy integrations
Designed a web application to replace oracle forms and moved it to the cloud
Leveraged react.js, SASS, HTML5, CSS3, node.js to design anuser-friendly web application
Worked in a fast-paced agile development environment that relied on JIRA & Confluence
Developed CI/CD processes for SDLC by leveraging Git, Jenkins, Docker, Kubernetes
Utilized Amazon S3 as a data lake to store raw data gathered in JSON, XML, text, and other related file formats
Integrated Salesforce Marketing Cloud with AWS services, using Amazon S3 for data storage, AWS Lambda for data processing, and Amazon Redshift for data warehousing
Worked extensively with Salesforce Marketing cloud via Mulesoftconnectors
Created a Salesforce Marketing cloud to JIRA integration to automate the Salesforce Marketing cloud Service requests to JIRA Tickets
Tech Stack: AWS Redshift, PySpark, Mulesoft, AWS Lambda, AWS API Gateway, Mulesoft API Gateway, REST APIs, cron, BASH, Shell Scripting, node.js, Python, Java, HTML5, CSS3, Bootstrap, Oracle Forms, SASS, react.js, JIRA, Confluence, SDLC, Git, Jenkins, Docker, Kubernetes, Salesforce, Salesforce Marketing Cloud, Postman, Oracle DB