Data Engineer Senior

Location:

Remote Mainland, AK, 99801

Posted:

June 13, 2025

Contact this candidate

Resume:

ABHISHEK BHAUMIK

*************@*****.***

732-***-****

PROFESSIONAL SUMMARY

Certified AWS and Databricks with over 12 years of experience as a Python Engineer, including 3+ years in a lead role and 6+ years as a Senior Data Engineer with strong expertise in Python, ETL, REST APIs, AWS, LLM and SQL across diverse environments and technologies.

Designing the user interactive web pages/ templates as the front-end part of the application using various technologies like HTML, CSS3, JavaScript, AJAX, jQuery, Vue.JS, ReactJS, and JSON and implementing Bootstrap for better user experience.

Worked with various Python Integrated Development Environments (IDEs), including NetBeans, PyCharm, and Sublime Text.

Strong knowledge of Python best practices (PEP8) and coding standards.

Skilled in developing web-based applications using Python, Django, XML, CSS, HTML, and JavaScript.

Implemented test automation scripts for mobile applications using Appium for both Android and iOS platforms.

Expertise in MVW frameworks, including Django, AngularJS, React.js, JavaScript, jQuery, and Node.js.

Developed Python automation scripts for functional testing and data validation.

Experience in developing web-based applications and tools using Python 2.x, 3.x using different Web frameworks like Django, Flask and FastAPI.

Experience in developing, consuming, and testing RESTful API’s using different Python Web Frameworks like Django REST framework, Flask, FastAPI (Asynchronous framework) backed up by different types of databases like MySQL, PostgreSQL, Oracle, MSSQL also worked on NoSQL databases like DynamoDB and MongoDB.

Experienced in creating applications using Large Language Models (LLM) and Lang Chain by refining and applying Retrieval Augmented Generation techniques.

Strong understanding of OOPs, Multi-threading, and Collections concepts in Python with hands-on experience in data analysis using Pandas.

Experience in data analysis and machine learning, leveraging NumPy, SciPy, and applying ML algorithms such as regression, clustering, and ensembling.

Developed applications utilizing AWS services, including EC2, CloudSearch, ELB, CloudFront, and Route 53. Automated AWS tasks such as snapshot creation using Python scripts.

Worked with version control systems like Git (GitHub/Bitbucket) and SVN.

Implemented CI/CD pipelines for application testing and deployment using Jenkins, Maven, Nexus, GitHub, CHEF, Terraform, and AWS.

Used Ansible Playbooks to set up Continuous Delivery Pipelines, deploying microservices and provisioning AWS environments.

Experience with Docker, including creating container images, tagging, pushing images, and managing Docker containers.

Hands-on experience with SQL, PL/SQL, writing complex stored procedures, and optimizing SQL queries in relational databases like Oracle, MySQL, and PostgreSQL.

Created various Database connections using sqlite3, psycopy2, pyodbc, and so on.

Developed UNIX shell scripts and Bash scripting for automation.

Strong proficiency in unit testing using UnitTest, PyTest, and integrating test cases with the build process. Worked with testing frameworks like Zope, Nose, and Robot Framework.

Experience in linking Python APIs to Spark Core using PySpark and initializing Spark contexts.

Deployed and monitored scalable infrastructure on AWS, utilizing configuration management tools like Chef.

Experience with Elasticsearch and NoSQL databases like MongoDB and Accumulo.

Experience deploying applications in a Cloud environment.

Knowledge of Hadoop technologies like Spark, Scala, HBase, Hive, and Cloudera.

Advanced knowledge of application, data, and infrastructure architecture disciplines.

Experience in writing Python scripts for data manipulation, data loads, extracts, statistical analysis, and modeling.

Professional experience with Machine Learning algorithms, including linear regression, logistic regression, Naive Bayes, Decision Trees, Clustering, and Principal Component Analysis.

Well-versed in all phases of the Software Development Life Cycle (SDLC), including Agile methodologies and SCRUM process.

Created use-case diagrams, activity flow diagrams, class diagrams, and object diagrams in the design phase and developed the coding module.

Skilled in analyzing and gathering business and system requirements.

Proficient in data analysis, data mining, optimization tools, and machine learning techniques for solving business problems.

SKILLS:

AWS Services

EC2, S3, ELB, auto scaling Servers, Glacier, Storage Lifecycle rules, Elastic Beanstalk, Cloud Front, Elastic ache, RDS, VPC, kinesis, Route 53, Red Shift, Cloudwatch, Cloud trail, IAM Role, Cloudfront.

Data Warehousing

Dimensional Modeling, Star/Snowflake Schema

Data Modeling

UML, E-R Diagrams, Data Flow Diagrams, Data Dictionary, Data Mapping

Source Control Tools

CVS, GIT, GitLab, Subversion and GITHUB

Database management:

Oracle 8i/9i/10g and MySQL, DB2, SQL Server, Mongo DB

Programming

Typescript, Python, XML, SAS, R, Terraform

Web Technologies:

HTML5, CSS, JavaScript, Angular JS, JSP, XML, XSD, XPATH, XSLT.

Web/Application Servers:

IBM Web sphere 6.x, Apache Web Server 2.0, Apache Tomcat 6.0, Sun ONE Web Server6.0 & IIS Web Server 4.0/5.0, web logic.

Web Debugging

Tools Firebug, FirePath, XPath, Webdebugger

Reporting Tools

Tableau, Power BI, Cognos, Microsoft Reporting Services, Crystal Reports

Operating system:

Windows, Linux, Mac

Methodologies

Agile, Scrum, Waterfall, SDLC

EDUCATION

UNIVERSITY OF COLORADO BOULDER 2020 Boulder, CO

Master of Business Administration, Analytics (3.55/4)

VELLORE INSTITUTE OF TECHNOLOGY 2013 Vellore, India

Bachelor of Technology, Electrical and Computer Engineering (3.9/4)

CERTIFICATIONS

Databricks Certified Data Engineer - Associate

AWS Certified Solutions Architect - Associate

WORK HISTORY

State Street Corporation (Remote) Jul 2023 - Present

Lead Engineer (AWS & AI)

Led performance optimization initiatives, improving ETL processing speeds by 30% through partitioning, caching, and query tuning.

Developed and optimized scalable ETL pipelines using Databricks, Apache Spark, and Python, handling petabyte-scale financial data.

Deployed scalable Java and Python applications using AWS Elastic Beanstalk for seamless cloud infrastructure management.

Developed efficient data ingestion pipelines using Pandas to read and preprocess large CSV files (>3GB), improving processing speed by 30%.

Designed and orchestrated complex Apache Airflow DAGs to manage dependencies, retries, and logging for multi-step data processing workflows across multiple sources.

Implemented event-based triggers in Airflow to launch DAGs automatically upon file arrival in AWS S3, enabling near-real-time data ingestion pipelines.

Used ANALYZE TABLE to collect table-level and column-level statistics, enabling Spark’s cost-based optimizer (CBO) to make more efficient query plans.

Designed automated maintenance workflows in Databricks to regularly run OPTIMIZE, ANALYZE, and VACUUM as part of data freshness and performance tuning strategy.

Implemented memory optimization techniques in Pandas (e.g., chunking, data type optimization) to handle and analyze multi-million row datasets, reducing memory usage by 40%.

Improved data processing efficiency by selecting optimal file formats (Parquet, ORC) with columnar storage, enabling faster query performance through predicate pushdown and compression techniques.

Optimized DQ framework performance by tuning Spark configurations and partitioning strategies, reducing validation job runtimes by over 40% across large datasets in Delta Lake.

Designed and built cumulative aggregation tables in Snowflake for performance-efficient reporting, enabling trend analysis across historical data snapshots with minimal query latency.

Built interactive Snowflake dashboards to visualize DQ rule execution results, exceptions, and trend breakdowns by rule category and severity, enhancing stakeholder visibility into data health.

Designed and optimized OLAP cubes in Kyvos on top of Snowflake datasets to support sub-second analytical queries across billions of rows for business units including finance, risk, and operations.

Built and maintained Kyvos semantic layer mappings, aligning business metrics and KPIs with source tables to provide a consistent, governed data access layer for self-service BI tools.

Automated data processing workflows using Pandas and Python scripting, streamlining data extraction, transformation, and loading (ETL) processes for regular reporting and analytics.

Developed and deployed a machine learning model to classify client portfolios into risk categories (Conservative, Moderate, Aggressive) using historical asset allocation, volatility, performance indicators, and macroeconomic exposure.

Integrated AWS Bedrock's Claude LLM to enhance the inference pipeline by generating contextual explanations for each risk prediction, improving model interpretability and client trust.

Integrated React.js frontend with FastAPI backend using Axios, enabling efficient asynchronous API communication and seamless state management with Redux Toolkit.

Build data pipelines in Airflow/Composer for orchestrating ETL related jobs using different airflow operators and Implemented Spark Streaming and Spark SQL using Data Frames.

Managed version control using GitHub and Bitbucket, utilizing a wide range of Git commands (e.g., branching, merging, rebasing, and resolving conflicts) to streamline collaborative development, ensure smooth code integration, and maintain a clean and efficient project workflow.

Designed, deployed, and managed a CI/CD solution which includes automated testing and automated notification of results using technologies like Ansible, Terraform, Packer, CloudFormation and Docker.

Developed applications on AWS by provisioning EKS and ECS clusters on EC2 instances using Docker, Bash, Chef, and Terraform.

Implemented Airflow orchestration and scheduling for complex data pipelines, leveraging event-based scheduling to trigger workflows based on real-time data changes, ensuring seamless automation, improved data processing efficiency, and timely execution of tasks across multiple environments.

Provisioned load balancer, auto-scaling group, and launch configuration for microservices using Flask.

Implemented containerized applications using Docker and deployed them on Amazon ECS (Elastic Container Service) to ensure scalable, efficient, and isolated environments for microservices, improving application performance, deployment consistency, and resource management in the cloud.

Deployed and managed containerized applications on Amazon EKS (Elastic Kubernetes Service), optimizing orchestration, scaling, and management of microservices in a secure, highly available, and cost-effective Kubernetes environment.

Integrated Large Language Models (LLMs) for natural language processing tasks such as text generation, summarization, and chatbot development.

Developed NLP-based text analysis pipeline using NLTK, Pandas, and WordCloud to process and visualize textual data, including WebVTT subtitle files.

Implemented custom middleware in FastAPI to manage logging, error handling, and request/response transformations for improved debugging and system monitoring.

Implemented OAuth 2.0 and token-based authentication in both TypeScript and Python services, ensuring secure communication and data protection between the client and server.

State Street Corporation (Remote) May 2022 – Jun 2023

Sr. Cloud Data Engineer (AWS & ML)

Responsibilities:

Implemented unit tests using Pytest, leveraging data mocking to simulate external sources, which improved test reliability, isolated code behavior, and ensured more accurate validation of application logic.

Used Django configuration to manage URLs and application parameters.

Design and Develop ETL Processes in AWS Glue to migrate accident data from external sources like S3, and Text Files into AWS Redshift.

Automated data validation and DQ checks within Databricks jobs, ensuring data accuracy and integrity across pipelines feeding analytics and reporting systems.

Improved query performance by applying the OPTIMIZE command with Z-ORDER on frequently queried Delta tables, significantly reducing scan times for high-cardinality columns.

Created OLAP applications with OLAP services in SQL Server with the help of ETL tool and build cubes with many dimensions using AWS Snowflake schemas.

Implemented routine VACUUM operations to manage Delta table storage by cleaning up obsolete data files, thereby reducing storage costs and improving I/O efficiency.

Designed and implemented complex stored procedures, triggers, and indexing in PostgreSQL & Oracle, reducing query execution time by 40%.

Implemented DQ rule execution and alerting workflows using Databricks Workflows, orchestrating routine checks and publishing DQ violations into summary Delta tables for auditing and compliance tracking.

Developed automated ETL pipelines to feed Kyvos cubes from structured data sources (e.g., Snowflake, Hive), applying transformations and validations to ensure cube integrity and freshness.

Developed a scalable Data Quality (DQ) framework in Databricks with configurable validation rules for null checks, duplicate detection, threshold checks, and schema conformity, improving early error detection across data pipelines.

Integrated Kyvos with Tableau and Power BI dashboards, leveraging the semantic layer to enable drag-and-drop reporting for non-technical users without exposing raw SQL logic.

Performed dimensional data modeling using Star and Snowflake schemas, aligning data architecture with reporting needs and supporting robust analytics and KPI tracking at multiple business granularity levels.

Demonstrated expertise in handling time-series data using Pandas, including date-time indexing, resampling, and rolling window computations for forecasting and trend analysis.

Created interactive dashboards and reports by integrating Snowflake data with Tableau, Power BI, and ThoughtSpot, enabling real-time data visualization, actionable insights, and informed decision-making for stakeholders across multiple business units.

Integrated machine learning models for sentiment analysis and predictive analytics using Python libraries such as Scikit-learn and TensorFlow, enhancing data-driven decision-making.

Wrote Terraform scripts to provision AWS resources like EC2, ELB, and S3, integrating infrastructure as code practices.

Utilized Pandas for data manipulation and analysis, leveraging its powerful data structures (DataFrames and Series) to clean, transform, and aggregate large datasets, perform complex data analysis, and optimize data processing tasks for efficient workflow and insightful reporting.

Developed responsive and dynamic web applications using React.js, implementing hooks, context API, and component lifecycle methods to create reusable, maintainable UI components.

Implemented asynchronous operations using TypeScript's async/await and Promise mechanisms to fetch data from Python APIs, enabling non-blocking calls and improving the performance and responsiveness of the application.

Developed RESTful APIs using FastAPI, leveraging its asynchronous capabilities for high-performance backend services.

Integrated FastAPI with Python frameworks like SQLAlchemy for ORM and Pydantic for data validation, ensuring robust API endpoints.

Implemented deep learning frameworks such as TensorFlow and PyTorch for model training and optimization.

AmerisourceBergen Corp. (ABC) – Conshohocken, PA Mar 2021 – May 2022

Sr. AWS Data Engineer

Responsibilities:

Designed front end and backend of the application using Python and Django Web Framework.

Designed and implemented ETL workflows using MS SSIS to extract, transform, and load data from multiple sources (SQL Server, flat files, APIs) into a centralized data warehouse, improving data accessibility and reporting efficiency.

Developed complex data transformation logic in SSIS, including data cleansing, mapping, aggregation, and validation, ensuring data quality and consistency across systems.

Collaborated with cross-functional teams to gather business requirements and define data transformation rules, ensuring the accurate flow of data into target systems

Created Python/Django based web application using Python scripting for data processing, Oracle/SQlite3 for the backend database, HTML/CSS/JavaScript/jQuery, and Bootstrap for serving front end web pages.

Developed data pipelines for medical image pre-processing, training, and testing using Python.

Built scalable data pipelines leveraging PySpark, AWS Glue, AWS Lambda, Shell Scripting, and DynamoDB.

Developed PySpark-based ETL pipelines, processing terabytes of healthcare data.

Created OLAP applications with OLAP services in SQL Server with the help of ETL tool and build cubes with many dimensions using AWS Snowflake schemas.

Configured Kyvos role-based access control (RBAC) to secure data access across dimensions and measures, enabling governance and compliance for sensitive financial and client datasets.

Implemented incremental cube refresh logic in Kyvos to minimize cube rebuild time and ensure up-to-date analytical data without full reprocessing, improving efficiency for daily reporting pipelines.

Developed parameterized stored procedures to facilitate dynamic reporting and reduce manual query modifications.

Responsible for setting up Python REST API framework using Flask and FastAPI and providing interactive Open API Standard (aka. Swagger) API documentation to the other cross functional teams.

Created dynamic user interfaces with HTML, CSS, Bootstrap, JavaScript, AngularJS, and JSON, improving frontend responsiveness.

Collaborated with business stakeholders to define semantic layer logic, including calculated measures, hierarchies, and role-based data access controls, ensuring secure and meaningful data exploration.

Designed and implemented a graph-based AI solution utilizing Neo4J for optimizing relationship mapping and data flow within the system, which significantly improved data retrieval times by 30%.

Optimized AWS cloud architecture using CloudFront and serverless services to enhance performance and cost efficiency.

Built and maintained dynamic web applications with Django, Flask, and MySQL.

Used AWS-CLI and Boto3 to automate the suspension of AWS Lambda functions and backup of ephemeral data stores to S3 buckets and EBS volumes, enhancing data management workflows, ensuring secure and efficient backup, and optimizing data retrieval in the cloud environment.

Designed and executed complex PostgreSQL queries for efficient data retrieval and transformation.

Utilized TypeScript’s strong typing and interfaces to validate and manage JSON data from Python backend services, ensuring consistency and accuracy during data exchange between the client-side and server-side.

Developed AWS Lambda functions to automate transformations and analytics on large datasets.

Leveraged Git for version control and Jira for workflow management and progress tracking.

Automated infrastructure provisioning and management using Terraform.

Designed and managed AWS infrastructure, including EC2, S3, Lambda, ECS, EBS, ELB, VPC, and IAM security configurations.

Configured AWS Lambda for serverless resource automation and cron job execution with SNS notifications.

Worked with DevOps tools including Jenkins, Docker, and Kubernetes for seamless deployment.

Designed data models using Django ORM, integrating MySQL and Oracle databases on AWS RDS.

Developed interactive Tableau and Snowflake dashboards to analyze and visualize business metrics effectively.

Built machine learning pipelines and AI-driven analytics using AWS SageMaker for training, deployment, and inference.

Applied deep learning techniques using TensorFlow and PyTorch for image processing, NLP, and predictive modeling.

Lumen Technologies (formerly CenturyLink) – Monroe, LA Nov 2019 – Feb 2021

Python Full Stack Developer

Responsibilities:

Power shell script development and execution for various auditing requirements in Windows and Unix platforms.

Participate in architectural, design and product sessions and develop innovative solutions based on product requirements and business challenges in both Python and JavaScript

Ensure all Django Python code is suitable for automated unittesting and own code coverage.

Developed views and templates with Python, Django view controller and template language to create a user-friendly website interface and used Django Database API's to access database objects.

Implemented and enhanced CRUD operations for the applications using the MVC (Model View Controller) architecture of Django framework and Python conducting code reviews.

Automated various service and application deployments with ANSIBLE on CentOS and RHEL in AWS. Wrote ANSIBLE Playbooks with Python, SSH as the Wrapper to Manage Configurations of AWS Nodes and Test Playbooks on AWS instances using Python. Run Ansible Scripts to provision Dev servers.

Experience with Version Control, mainly using SVN and GitHub.

Iterate rapidly and work collaboratively with product owners, developers, and other members of the development team.

Integration of data storage solutions using Django ORM system for MongoDB.

Implemented pyspark for Transformation and Actions in Spark Having working Knowledge of Amazon AWS ECS Worked extensively on Big Data analytical models developed in Python.

Created monitors, alarms and notifications for EC2 hosts using Cloud watch.

Used data types like dictionaries, tuples and object-oriented concepts-based inheritance features for making complex algorithms of networks.

Work in AWS and familiar with EC2, Cloud watch and Elastic IP's and managing security groups on AWS.

Design and create efficient RESTful endpoints for both internal and public consumption

Provide task estimations and deliver quality code on time

Generated Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.

Designing and implementing functionality using technologies including JavaScript, AJAX, and JQuery.

Charles Schwab - Denver, Colorado Feb 2019 – Nov 2019

Python Full Stack Developer

Responsibilities:

Wrote reusable, testable, and efficient code and created back-office tooling

Developed and interfaced with back-end services.

Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.

Used Django Database API's to access database objects.

Used jQuery and Ajax calls for transmitting JSON data objects between frontend and controllers.

Worked as an application developer with controllers, views and models in Django and working knowledge in Django Rest Framework (DRF)

Implemented the application using Python Spring IOC (Inversion of Control), Django Framework and handled the security using Python Spring Security.

Created intuitive integration tools, documentation, tutorials, example code, and developed unit tests for new components

Worked in highly technical teams employing CI/CD and Agile/Scrum practices.

Demonstrate an understanding of DevOps principles around Continuous Integration and Continuous Delivery.

Cognizant Technology Solutions - India Oct 2013 – Aug 2018

Python Engineer

Responsibilities:

Prepared and analysed reports using Python libraries and involved in environment Setup.

Developed entire frontend and backend modules using Python on Django Web Framework.

Developed RESTFUL API's for the functionality implemented in the project using class-based views in Python Django. Used Django’s native testing framework to implement unit testing.

Developed an autonomous continuous integration system by using GIT, MySQL, and custom tools designed in Python and Bash.

Used REST-based Micro Services with REST template based on RESTful APIs and designed, developed the UI for the client websites by using HTML, CSS and jQuery.

Migrated the Django database from SQLite to MySQL with complete data integrity.

Worked on AJAX framework to transform Datasets and Data tables into HTTP- serializable JSON strings.

Responsible for setting up REST APIs using Django.

Used Gitlab for continuous integration and deployment and Git version control system for collaborating with teammates and maintaining code versions.

Contact this candidate