Data Engineer Senior

Location:

Burlington, MA

Posted:

February 21, 2025

Contact this candidate

Resume:

Professional Summary:

Over **+ years of extensive experience working as a Data Engineer supporting AI Applications.

Experienced in AWS Services like S3, Lambda, Glue, EMR, Step Function, IAM, CloudWatch, Athena, Redshift, etc.

Experience in GCP services like BigQuery, DataProc, Compute Engine, Cloud Storage, Cloud RUN, Cloud Function, and Cloud Composer.

Strong experience in Apache Spark, Apache Kafka, Apache Flume, Apache Hive, and Apache Flink.

Proficient in data modeling, database design, and data warehousing concepts.

Experienced in big data technologies such as Hadoop, Spark, Kafka, and Hive.

Experience with cloud infrastructure, including maintaining up-to-date documentation of GCP resources, including configuration details, access policies, and usage logs.

Optimized storage and retrieval in Azure Data Lake Storage through efficient partitioning and storage formats within Databricks.

Implemented Delta Lake architecture within Azure Databricks for reliable and scalable data versioning and transaction management.

Integrated Azure Databricks seamlessly with Tableau for real-time visualization and reporting of analytics results.

Established data governance practices within Databricks, including metadata management and lineage tracking.

Utilized PySpark for advanced analytics, implementing custom transformations and analysis functions.

Experience in ETL processes, including extracting, transforming, and loading data from various sources into Data Warehouses, as well as data processing using Apache Flume, Kafka, Power BI, and SSIS.

Worked on implementing advanced procedures like text analytics using Apache Spark in Scala.

Managed Docker orchestration and containerization using Kubernetes for deployment, scaling, and management of containers.

Experience in implementing optimization techniques in Hive and Spark.

Exposure to Big Data ecosystems using Hadoop, Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Airflow, Snowflake, Teradata, Flume, Kafka, Yarn, Oozie, and Zookeeper.

Worked with IDEs such as Eclipse, IntelliJ, PyCharm, Notepad++, and Visual Studio for development.

Technical Skills

Cloud

AWS, GCP,Azure

Programming Languages

SQL, Shell Script, Python, PySpark and Java

Big data Frameworks

Apache Spark, Apache Flink, Hadoop, EMR, DataProc

Databases

SQL Server, Oracle, MySQL, PostgreSQL, Hive, T-SQL, SQL Management Studio, Mango DB

AWS Services

AWS Glue, EMR, S3, Redshift, Lambda, Step Functions, SNS, Athena, EC2, SageMaker, CloudWatch, Data Pipeline

GCP Services

BigQuery, Pub/Sub, DataProc, DataFlow, Cloud Composer, Cloud Run, Compute Engine

Azure

Snowflake, Databricks

Orchestration tools

Airflow, Kubernetes, Docker, Crontab

Devops & CI/CD

Terraform, Jenkins, Git, Cloud Build, CloudFormation

Professional Experience:

Client: T-Mobile, Atlanta, Georgia, USA Feb 2024 – Till date

Position: Sr. Data Engineer

Responsibilities:

Worked with both AWS (Glue, EMR, S3, Redshift, Lambda) and Azure (Databricks, Snowflake) to design and implement scalable data architectures, data pipelines, and ETL processes.

Developed and optimized ETL pipelines using Apache Spark, Python, Snow SQL, and AWS Glue, ensuring efficient data extraction, transformation, and loading.

Used Spark Streaming and Kafka for real-time data processing, integrating with AWS and Azure services for end-to-end data movement.

Built and maintained self-service data pipelines using AWS services like SNS, Step Functions, and Redshift, while optimizing performance using Spark SQL and AWS Glue.

Configured and managed Azure Databricks workspaces, optimizing cluster configurations for efficient data processing and scalable ML model deployment.

Designed and implemented data models and metadata transfer solutions, ensuring alignment with enterprise guidelines and standards.

Developed and applied NLP techniques for text extraction, entity recognition, and sentiment analysis to extract meaningful insights from unstructured data.

Worked with machine learning models and LLMs for document similarity searches and content generation.

Built and maintained CI/CD pipelines using Terraform for testing and production environments.

Experience with Snowflake data architecture for reliable data processing, storage, and retrieval.

Proficient in tools like Tableau for developing data dashboards and reports, enabling data-driven decision-making.

Environment: AWS, Azure, Glue, EMR, S3, Redshift, Lambda, SNS, Step Functions, Athena, Databricks, Snowflake, Apache Spark, Apache Flink, Hadoop, Python, Scala, Java, Snow SQL, Spark Streaming, KafkaTerraform, Jenkins, Tableau, Power BI, Airflow, Kubernetes, Docker.

Yashoda Hospitals, Hyd, Ind Oct 2020 – Jul 2022

Position: Sr. Data Engineer

Responsibilities:

Built real-time and batch data pipelines using GCP services such as BigQuery, Pub/Sub, DataProc, Dataflow, Cloud Composer, Cloud Run, and Compute Engine.

Modeled and developed a new data warehouse from scratch and migrated it to BigQuery. Automated data pipelines and implemented quality control checks.

Architected and implemented the migration of the data platform from Hadoop to Google Cloud Platform

Developed and maintained CI/CD processes for machine learning systems using Cloud Build, GIT and Jenkins.

Created and managed jobs using Cloud Composer to migrate and transform data from Data Lake into BigQuery for further analysis.

Designed, developed and optimized data processing pipelines using Apache Flink and Java to ensure scalability and performance improvements.

Designed and implemented data flow diagrams (DFDs) and data models using Erwin to visualize data flow and database schema.

Developed PySpark scripts for ETL processes using DataProc.

Implemented monitoring and error tracking using GCP Stackdriver to improve system observability.

Helped to implement PowerBI for reporting and dashboarding across various departments, integrating data from BigQuery.

Environment: GCP, BigQuery, Pub/Sub, DataProc, Dataflow, Cloud Composer, Cloud Run, Compute Engine, Apache Flink, Java, PySpark, PowerBI, PySpark, Jenkins, GIT.

NTT Data Business Solutions, HYD, INDIA Feb 2016 – Sep 2020

Position: Data Engineer

Responsibilities:

Built smart data pipelines to read data from multiple sources (Docker and Kubernetes applications) and load data into HDFS and AWS S3.

Utilized AWS services such as SNS, Step Functions, Lambda, Glue, EMR, EC2, and Athena to design, develop, and maintain scalable data processing solutions.

Developed and managed data models on Athena to transform raw data into structured formats suitable for analysis.

Ensured scalability, reliability, security, and compliance of data pipelines, adhering to best practices for data governance.

Developed Python scripts using libraries like Pandas for data manipulation, including reading and writing CSV files, and performing column-based comparisons.

Cleaned and processed third-party spending data into deliverable formats using Python and Excel macros.

Designed and maintained development environments using bug-tracking tools (Jira, Confluence), and version control (Git, SVN).

Migrated data from RDBMS to Hive tables using SQOOP and generated visualizations with Tableau.

Wrote Spark applications in Scala and Python to improve data processing efficiency.

Implemented Spark using Scala and Spark SQL for fast data processing and testing.

Environment: AWS, HDFS, SNS, Step Functions, Lambda, Glue, EMR, EC2, Athena, Apache Spark, Scala, Spark SQL, Python, Pandas, SQOOP, AWS S3, Hive, Docker, Kubernetes, Tableau, Git, SVN, Jira, Confluence.

HDFC BANK, MUM, IND Oct 2013 - Jan 2016

Position: Big Data/ETL Developer

Responsibilities:

Defined requirements for data lakes, data pipelines, and data layouts to support scalable and efficient data management.

Collaborated with ETL teams to define data layouts, rules, and transformation logic to meet business needs.

Managed the import and export of data between Oracle RDBMS and HDFS/Hive using Sqoop, ensuring smooth data migration.

Developed complex SQL queries, stored procedures, triggers, views, indexes, and user-defined functions to implement business logic and optimize data handling.

Utilized SQL Management Studio to create and modify T-SQL code for procedures, packages, functions, and views.

Designed, developed, and scheduled SSIS packages to import/export data from various sources.

Created and maintained a variety of reports using SSRS, delivering actionable insights to stakeholders.

Generated ad-hoc, summary, sub-reports, and drill-down reports using SSRS for in-depth analysis and decision-making.

Managed and reviewed Hadoop log files for performance monitoring and troubleshooting.

Environment: Data Lakes, Data Pipelines, Sqoop, Oozie, SSIS, Oracle RDBMS, HDFS, Hive, SQL, T-SQL, Stored Procedures, SSRS, SSIS Packages, Oozie.

Education:

Masters in Computer Information Systems from Rivier University 2023.

Contact this candidate