Data Warehouse Machine Learning

Location:

California

Posted:

July 08, 2025

Contact this candidate

Resume:

Mahesha Subrahamanya

San Francisco Bay Area

**********@*****.*** 801-***-**** linkedin.com/in/mahesha-s-5281b15

Professional Summary:

Strong experience with Amazon AWS Cloud services such as S3, IAM, RDS, Cloud Watch, AWS Glue, Athena, Amazon RDS, Sage Maker, Lambda, SQS, Auto Scaling, RedShift DB & other services of the AWS family.

Experience in configuring EMR clusters on AWS and PySpark SQL code to process massive data in Apache Spark Cluster in Parquet format.

Good experience in Python 3. X and exposure to object-oriented mechanisms

Experience with data pipelining tools like Spark/PySpark and dBt, and platforms like AWS

Major expertise with RDBMS Systems like PostgreSQL, MySQL, Oracle SQL, Pl/SQL, and T-SQL.

Designed and Developed ETL jobs with SQS to extract data from s3 objects to iterate through list of messages to load data into an open-source Data warehouse system like Apache Druid for cost-effective solutions.

Hands-on experience in Data Analytics Services such as Athena, Glue, Data Catalog, and Quick Sight.

Expertise in designing Star schema, Snowflake schema for Data Warehousing, and ODS architecture.

Design and Develop ETL processed in AWS Glue to migrate data from external sources like S3 and Parquet/Text files into AWS Redshift.

Have a good understanding of the Core JAVA and Object-Oriented methodology

Ensure data security and compliance with AWS IAM and KMS policies

Design and collaborate with data scientists to optimize machine learning models on AWS SageMaker.

Excellent understanding of the Software Development Life Cycle (SDLC), Agile, Scrum, and waterfall processes.

I have worked in various domains including Transportation, Finance, Banking, Retail, and Marketing cloud.

Good experience in Source Control Management and Release Management activities.

Good industry knowledge, analytical & problem-solving skills, and ability to work well in a cross-functional team as well as an individual.

Tools, Techniques, and Technologies:

Big Data Ecosystem: Hive, YARN, PySpark, Parquet and Snappy, AWS (EMR).

Languages: Java, Shell Script, Python (Pandas, boto3, pyarrow, psycopg2), PL/SQL, SQL, Unix.

Cloud Technologies: AWS (Web Services), S3, Athena, Glue, CloudWatch, WLM, RedShift

Framework/ Tools: DBT, GitLab, Postman, Linux, Docker, Jira, Spring Boot, ClearCase, Autosys, Apache Airflow, Data Lake, Data Warehouse concepts.

Database: AWS RDS, Redshift, Apache Druid, Oracle Exadata, DB2, Informix, Postgres, MySQL.

DevOps Technologies: Terraform, Docker, Kubernetes (k9s)

ETL Tools: Informatica - ETL (Data Integration)

Methodology: Agile, Waterfall, Scrum

IDE Tools: PyCharm, Amazon Sagemaker, IntelliJ

Data Visualization: AWS Quick Sight, Tableau, MS Power BI, Oracle OBIEE, QlikView.

Experience

Principal DW Engineer

Veeva Systems

Jul 2018 - Present (6 years 5 months) Responsibilities:

- Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partners in requirement gathering and converting them into technical specifications.

-Working on designing and developing a Commercial Data Warehouse Development (CDW Nitro) using prebuilt industry connectors and storing data in AWS Redshift DB.

-Recently built a Data Lake architecture using AWS Serverless services, such as s3, Glue, Redshift Spectrum schema, Python boto3, and Pandas, to achieve high scalability and a highly available system.

-FTP Intelligent Connector using Python functions, which acts as a data cleansing framework and facilitates for a third party (like a symphony, IMS, etc.) to provide patient, claims, sales, or activity data to load into Data warehouse systems and eventually rendered in the reporting system.

-Hands-on experience with the AWS CLI and SDKs/API tools.

-Primarily worked on a project to develop an internal ETL product to handle complex and large-volume healthcare claims and sales data. Designed ETL framework and developed several packages to Extract, Transform, and Load data using AWS MDS into Redshift databases to facilitate reporting operations.

-Developed a custom Python version built on ECR to run the AWS SageMaker (AI/ML) to quickly build, train, and deploy machine learning models with a fully GIT CI/CD pipeline integrated.

-Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large-scale data handling Millions of records every day.

-Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application.

-Install and configure Apache Airflow for the S3 bucket and Redshift data warehouse and create DAGs to run the Airflow

-Managed Amazon redshift clusters such as launching the cluster by specifying the nodes and performing the data analysis queries.

-Developed Shell Scripts for automation purposes.

-Expert in Query Tuning and Performance optimization and Implementing Workload management.

-Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running ad-hoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query responses for basic queries.

-Implemented Kubernetes manifests and helm charts for the deployment of microservices into k8s clusters.

Awards:

Awarded Employee of the Year in 2020 for being customer-focused during the pandemic.

Sr Software Developer

Responsys

Aug 2013 - Jul 2018 (5 years) Responsibilities:

I work closely with product management teams to understand business requirement, derive scope for the release, translate requirements into design/solutions and provide detailed specifications.

Successfully work with globally distributed cross functional teams such as product management, project management, Architects, user interface, operations and QA teams.

I have worked for solving complex and interesting problems that arise around high performance and high availability SaaS application the processes over 300 million transactions a day (tens of billions of transactions per month) and handles hundreds of terabytes of data.

-Created detailed Technical specifications for the ETL (Informatica) processes and stored procedures to pull data from source systems/files, Cleanse, transform and load data into various databases.

-Python script has been written in order to analyze data and producing CONNECT reports

-Developed a Proof of concept document for Oracle Exadata new features in order to adopt and deploy for Responsys product database changes.

-Fine tuning the SQL queries and PL/SQL complex business coding

-Optimizing the Mappings and implementing the complex business rules by creating Re-usable transformations and mapplets.

-Support existing application modules, develop new solutions and enhance existing applications.

-Involved with the DBA in performance tuning of the Informatica sessions and Workflows. Created the reusable transformations for better performance.

Awards:

Awarded Employee of the Year in 2016 for being customer focused in delivering the database migration very effectively with low latency.

Technology Analyst

Infosys

Mar 2010 - Jul 2013 (3 years 5 months)

Worked on couple of decommissioning projects from legacy systems (Sybase) into Oracle Exadata database (RMD GUI application is to provide investment solutions to individual and institutional investors, EWS - Equity Workspace application for Equity Investment Professionals to improve the user experience through the simplification, consolidation and retirement of legacy applications).

*Bug detection and reporting in RDBMS, PLSQL, Work involves constant interaction with the clients in order to get a complete picture of the nature of the problem.

*Develop Informatica mapping/workflow to stage inbound feeds and to apply required transformations.

*Work closely with the SIT/UAT testers to get the system tested.

Associate Projects.

Cognizant

Apr 2008 - Feb 2010 (1 year 11 months)

I have worked on data up-gradation scripts in informix database using unix shell script for GLS Phase 2 WMS System which will serve International General (IG) Merchandise DC's.Worked with retail chain client called wal-mart for distribution center operations.

*Extensively worked on UNIX Shell Scripting for data migration in informix database.

*Planning & Writing unit test cases, testing individual module units.

*Execute test script in utPLSQL

Software Engineer

Radiant Info Systems Ltd

Mar 2005 - Mar 2008 (3 years 1 month)

Radiant Info Systems Ltd is pioneer and market leader in Transport Logistics Segment in India. The new e-ticketing system would allow booking of tickets from anywhere from the internet.

I have involved in design, migration and deployment phases of this product across all major bus operators from a single platform.

I have helped in enhancing and supporting the existing Depot computerization system (DCS) and EBTM (Electronic Bus ticketing System) etc.

Responsibilities

Involved in writing Functions, Stored Procedures and Database Triggers Developed Major Part of the report for the APSRTC Project and troubleshooting.

Education

Visveswaraya Technological University (V.T.U)

Bachelor of Engineering (BE), Computer Science and Eng

2000 - 2004

Licenses & Certifications

Core Java - UC Berkeley Extension

X130881

CMPR.X415.(3) Python Programming for Beginners - UC Berkeley Extension

Grade A

AWS web services - Amazon Web Services (AWS)

AWS Certified Solutions Architect – Associate - Amazon Web Services (AWS)

Issued Jul 2022 - Expires Jul 2025

AWS Certified Cloud Practitioner - Amazon Web Services (AWS)

Issued Sep 2022 - Expires Sep 2025

Contact this candidate