Data Engineer Analyst

Location:

Washington, DC

Posted:

July 15, 2025

Contact this candidate

Resume:

Ruksar Jaha Mohammad

Sr. Data Engineer

**************.*@*****.*** # +1-425-***-****

PROFESSIONAL SUMMARY:

Having more than 8+ years of experience as Data Analyst with functional and industry experience with accomplished process and project responsibility such as Data analysis, design, development, user acceptance & performhttps://www.postjobfree.com/post-resume?t=&jid=xh1ue9#ance management.

Strong professional experience with emphasis on Analysis, design, development, testing, maintenance and implementation of Data Mapping, Data Validation, and Requirement gathering in Data warehousing Environment, Experience working with large data sets of structured and unstructured data.

Experience in data warehousing applications using ETL tools and programming languages like python, SQL/PLSQL, Oracle and SQL Server databases, Informatica, and SSIS.

Strong experience working with HDFS, MapReduce, Spark, Hive, Sqoop, Flume, Kafka, Oozie, Pig and HBase.

Experience in handling huge set of data using cloud clusters like Amazon Web Services (AWS), MS Azure, Hadoop and archiving the data.

Experience in providing custom solutions like Eligibility criteria, Match and Basic contribution calculations for major clients using Informatica and reports using Tableau/ PowerBI.

Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.

Experience in Data Analysis, Data Profiling, Data Data, Data Integration, and validation of data across all the integration points.

Good experience in writing SQL Queries and implementing stored procedures, functions, packages, tables, views, Cursors, triggers.

Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS).

Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.

Designed and developed weekly, monthly reports by using MS Excel Techniques (Charts, Graphs, Pivot tables) and Power point presentations

Experience in using collections in Python for manipulating and looping through different user defined objects.

Skilled in data management, ETL, and data manipulation, validation and cleaning using various conversation functions and multiple conditional statements.

Experience with Data analysis and SQL querying with database integrations and data warehousing for large financial organizations.

Strong working experience in Data Cleaning, Data Warehousing and Data Massaging using Python Libraries and MySQL.

Strong experience in using Excel & MS Access to dump the data & analyze based on business needs.

Experience in creating Ad-hoc reports, data driven subscription reports by using SQL.

Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.

Proficient in Database Design, Development, Normalization, Index, Store Procedures, Functions and Triggers.

Experience in creating Sub Reports, Tabular Reports, Matrix Reports, Parameterized Reports, drill down Reports and Charts ad hoc reporting using SQL Server Reporting Services (SSRS) and web reporting by customizing URL Access.

Experience including analysis, modeling, design, and development of Tableau reports and dashboards for analytics and reporting applications.

Expertise in Power BI, Power BI Pro, Power BI Mobile. Expert in creating and developing Power BI Dashboards.

Experienced in RDMBS such as Oracle, MySQL and IBM DB2 databases.

Hands on experience in complex querying writing, query optimizing in relation Databases including Oracle, T-SQL, Teradata, SQL Server and Python.

Experienced in business requirements collection methods using Agile, Scrum and Waterfall methods and software development life cycle (SDLC) testing methodologies, disciplines, tasks, resources, and scheduling.

Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement. Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.

Extensive experience in Dimensional Data Model and Application/OLTP Data Model for creating Logical and Physical Data Model.

Extensive knowledge on Data Profiling using Informatica Developer tool.

Highly motivated team player with excellent Interpersonal and Customer Relational Skills, Proven Communication, Organizational, Analytical, Presentation Skills, and Leadership Qualities.

TECHNICLE SKILLS:

Languages

SAS, SQL, Python, R

Programming Languages

Python (Pandas, NumPy, PySpark), SQL, R, SAS

BI Tools

Tableau, Microsoft Power BI, PowerPivot, SSRS

Cloud Platforms

AWS (S3, Redshift, Glue), Azure, Hadoop

Data Warehousing Tools

Talend, SSIS, SSAS, SSRS, Toad Data Modeller

Python Libraries

Scikit Learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly

CI/CD Tools

Jenkins, Azure DevOps

Version Control

Git, GitHub, GitLab

Workflow Orchestration

Apache Airflow, Dagster

Data Streaming

Kafka, Spark Streaming

Cloud-Native Tools

AWS Glue, AWS Redshift, Azure Data Factory, Azure Synapse Analytics

Containerization

Docker, Kubernetes

Data Visualization

Tableau, Microsoft Power BI, RStudio

ETL Toos

Informatica PowerCenter, SSIS, Talend

Microsoft Tools

Microsoft Office, MS Project

Database Tools

SQL server, MySQL, MS Excel, PostgreSQL, SQLite, MongoDB, Teradata

Data Analysis

Web Scraping, Statistical Modelling, Hypothesis testing, Predictive Modelling

Data Mining Algorithms

Decision Trees, Clustering, Random Forest, Regression

Education:

Master’s degree in Computer Science from University of Staffordshire 2016

CERTIFICATIONS:

AWS Certified Data Analytics – Specialty, 2023

Microsoft Certified: Azure Data Engineer Associate, 2022

PROFESSIONAL EXPERIENCE:

Client: GRSI, Bethesda, MD Jan 2023 – Present

Role: Sr. Data Engineer

Designed and implemented a scalable ETL pipeline using PySpark and AWS Glue to process 10TB of structured and unstructured data, reducing data ingestion time by 35%. Integrated Snowflake with Tableau for real-time analytics, enabling faster business decision-making. Architected a data lake on AWS S3, implementing partitioning and compression techniques to optimize query performance by 30% for downstream analytics teams

Responsibilities:

Identified and managed existing and emerging risks that stem from business activities.

Analyzing and validating large datasets to support ad-hoc analysis, reporting, and remediation using SAS.

Processing data from tools such as Snowflake and writing complex queries in SQL or SAS using complex joins, sub-queries, Table creation, Aggregation, and using concepts of DLL, DQL and DML.

Perform ETL (Extract Transform Load) using tools such as Informatica, azure to integrate/transform disparate data sources.

Perform data scraping, data cleaning, data analysis, and data interpretation and generate meaningful reports using Python libraries like pandas, matplotlib, etc.

Generate weekly reports with visualization using tools such as MS EXCEL (Pivot tables and macros), and Tableau to enable business decisions making.

Generate various dashboards and created calculated fields in Tableau for data intelligence and analysis based on the business requirement.

Ensured risks associated with business activities are effectively identified, measured, monitored, and controlled.

Architected and maintained scalable ETL pipelines using PySpark and Informatica, processing 10TB of data daily to support analytics and reporting.

Optimized Snowflake data warehouse queries, reducing latency by 30% and enabling real-time analytics for business stakeholders

Followed written risk and compliance policies, standards, and procedures for business activities.

Leveraged intermediate and some advanced business, analytical and technical knowledge to participate in discussions with cross functional teams to understand and collaborate on business objectives and influence solution strategies.

Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.

Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.

Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.

Applied advanced analytical techniques to solve business problems that are typically medium to large scale with impact to current and/or future business strategy.

Applied innovative and scientific/quantitative analytical approaches to draw conclusions and make 'insight to action' recommendations to answer the business objective and drive the appropriate change.

Translated recommendation into communication materials to effectively present to colleagues for peer review and mid-to-upper-level management.

Incorporated visualization techniques to support the relevant points of the analysis and ease the understanding for less technical audiences.

Used Power BI Desktop to develop data analysis multiple data sources to visualize the reports.

Involved in loading JSON datasets into MongoDB and validating the data using Mongo shell.

Identified and gathered the relevant and quality data sources required to fully answer and address the problem for the recommended strategy through testing or exploratory data analysis (EDA).

Transformed disparate data sources and determines the appropriate data hygiene techniques to apply.

Understands and adopts emerging technology that can affect the application of scientific methodologies and/or quantitative analytical approaches to problem resolutions.

Delivers analysis/findings in a manner that conveys understanding, influences mid to upper-level management, garners support for recommendations, drives business decisions, and influences business strategy.

Client: USAA, Plano, Texas Oct 2021 - Dec 2023

Role: Data Engineer

Built an automated data pipeline using Azure Data Factory to integrate data from SQL Server and MongoDB, streamlining reporting processes and reducing manual intervention by 50%. Optimized data warehouse performance by implementing indexing and partitioning strategies on Teradata, improving query execution time by 25%

Responsibilities:

Collaborated with stakeholders and cross-functional teams to elicit, elaborate, and capture functional and non-functional solutions.

Translated business requirements to technical requirement and did data modeling as per the technical requirement.

Performed data manipulation and cleansing using PowerBI to present data in the form of reports and dashboard.

Performed analytical modeling, database design, data analysis, regression analysis, data integrity, and business analytics.

Created database using MongoDB, wrote several queries to extract data from database.

Worked on batch processing of data sources using Apache Spark, Elastic search.

Designed and deployed an Azure Data Lake for centralized data storage, enabling seamless access for analytics and machine learning teams

Loaded the aggregated data into MongoDB for reporting on the dashboard.

Created Data Mapping between the source and the target system. Created documentation to map source and target tables columns and datatypes.

Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases

Implemented Apache Airflow to orchestrate ETL workflows, scheduling and monitoring 50+ daily jobs to ensure data availability for analytics teams

Performing basic DML and DDL skills like writing subqueries, window function and CTE.

Wrote SQL Queries using insert, update, delete statements and exported data in the form of csv, xml, txt etc.

Also, wrote SQL queries that included joins like inner, outer, left, right, self-join in SQL Server.

Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression.

Utilized AWS Glue for serverless ETL processing, integrating data from S3 and RDS into Redshift, reducing operational costs by 15%

Worked on MongoDB schema/document modeling, querying, indexing and tuning

Compared different WFHM DB environments and determined, resolved and documented discrepancies

Involved in ETL development, creating required mappings for the data flow using SSIS.

Generating various capacity planning reports (graphical) using Python packages like NumPy, matplotlib, SciPy.

Designed ETL flows to implement using Informatica Power Center as per the mapping sheets provided.

Optimization of data sources for route distribution analytics dashboard in PowerBI report runtime.

Involved in developing UML use-case diagrams, Class diagrams, and diagrams using MS Visio.

Worked on predictive analytics use-cases using Python language.

Utilized PowerBI to publish and share the reports with the business users

Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in PowerBI.

Manage Departmental Reporting systems, troubleshooting daily issues, and integrating existing Access databases with numerous external data sources including (SQL, Excel, & Access).

Designed and implemented multiple dashboards using PowerBI.

Design and implement end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using PowerBI.

Utilized PowerBI and custom SQL features to create dashboards and identify correlation

Prepared Scripts in R and Shell for Automation of administration tasks.

Involved in Performance Tuning of mappings in Informatica

Wrote several Teradata SQL Queries using SQL Assistant for Ad Hoc Data Pull request.

Extracting the source data from Oracle tables, MS SQL Server, sequential files and excel sheets.

Created Data Quality Scripts using SQL to validate successful data load and the quality of the data. Created various types of data visualizations using R and PowerBI.

Performed data analysis and data profiling using complex SQL on various sources systems.

Categorizing and generating a report on the multiple parameters using MS Excel, Power BI.

Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues.

Worked on logical and physical modeling of various data marts as well as DW/BI architecture using Teradata.

Client: JME Insurance, Dallas, TX Jul 2019 - Sep 2021

Role: Data Engineer

Developed a real-time data streaming pipeline using Apache Kafka to process insurance claims data, enabling near-real-time analytics for fraud detection. Migrated legacy ETL workflows from SSIS to Informatica PowerCenter, improving data processing reliability and reducing maintenance costs by 20%

Responsibilities:

Involved in requirements gathering, Analysis, Design, Development, testing production of application using SDLC - Agile/ Scrum model.

Worked on the entire Data Analysis project life cycle and actively involved in all the phases including data cleaning, data extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.

Also, manipulated the data for benefit related for lab related work and clinic services etc. by writing SQL queries that included joins like inner, outer, left, right, self-join in SQL Server and exported the data in the form of csv, txt, XML etc.

Creation, configuration and monitoring Shards sets. Analysis of the data to be shared, choosing a shard Key to distribute data evenly. Architecture and Capacity planning for MongoDB clusters. Implemented scripts for mongo DB import, export, dump and restore.

Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression.

Worked on Data Warehousing principles like Fact Tables, Dimensional Tables, Dimensional Data Modelling - Star Schema and Snowflake Schema.

Created adhoc reports to users in Tableau by connecting various data sources.

Generated Dashboards with Quick filters, Parameters and sets to handle views more efficiently.

Used excel sheet, flat files, CSV files to generated Tableau adhoc reports.

Involved in defining the source to target data mappings, business rules, business and data definitions

Worked closely with stakeholders and subject matter experts to elicit and gather business data requirements.

Used Pandas, NumPy, seaborn, SciPy, Matplotlib in Python for developing various machine-learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression for data analysis.

Work with business analyst groups to ascertain their database reporting needs.

Wrote scripts in Python for extracting data from HTML file.

Worked with connecting the databases from PostgreSQL to python.

Tracked Velocity, Capacity, Burn down Charts, and other metrics during iterations.

Using R automated a process to extract data and various document types from a website, save the documents to specified file path, and upload documents into an excel template. Performed data analysis and data profiling using SQL on various sources systems including Oracle and Teradata.

Utilized SSIS ETL toolset to analyze legacy data for data profilin

Utilized PowerBI reporting to create, test and validate various visualization, reports (ad-hoc), dashboards, and KPI’s.

Designed and published visually rich and intuitively interactive PowerBI/Excel workbooks and dashboards for executive decision making

Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering.

Developed, Streamlined CRM database and built SQL queries for data analysis of 1 million + records.

Generated New Market and Investment Banking reports by using SSRS and increased the efficiency by 50%.

Introduced Power BI, designed dashboards for time-based data and improved performance by 40%.

Build ETL workflows for automated reporting of Investment Banking data and reduced the workload by 40% using SSIS.

Company: ADP, Pasadena, CA Apr 2017 - Jun 2019

Role: Data Analyst

Responsibilities:

Generated energy consumption reports using SSRS, which showed the trend over day, month and year.

Performed ad-hoc analysis and data extraction to resolve 20% of the critical business issues.

Well versed in Agile SCRUM development methodology, used in day-to-day work in application for Building Automation Systems (BAS) development.

Weekly, monthly and Quarterly insight reporting utilizing Excel, Tableau and SQL database about pricing trend and opportunities.

Streamlining and automating Excel/ Tableau dashboards for improved speed utilization through Python and SQL based solutions.

Designed creative dashboards, storylines for dataset of a fashion store by using Tableau features.

Developed SSIS packages for extract/load/transformation of source data into a DW/BI architecture/OLAP cubes as per the functional/technical design and conforming to the data mapping/transformation rules.

Developed data cleaning strategies in Excel (multilayer fuzzy match) and SQL (automated typo detection and correction) to organize alternative datasets daily to produce consistent and high-quality reports.

Involved in Data Analysis and Data Validation by extracting records from multiple databases using SQL in Oracle SQL Developer tool.

Used MongoDB internal tools like Mongo Compass, Mongo Atlas Manager & Ops Manager, Cloud Manager etc. Worked on MongoDB database concepts such as locking, transactions, indexes, sharing, replication and schema design.

Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into the database.

Hands on experience in developing apache SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.

Identified the data source and defining them to build the data source views.

Involved in designing the ETL specification documents like the Mapping document (source to target).

Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.

Created Stored Procedures and executed the stored procedure manually before calling it in the SSIS package creation process.

Written SQL test scripts to validate data for different test cases and test scenarios

Created SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets.

Performed data manipulation - inserting, updating, and deleting data from data sets

Developed various stored procedures for the data retrieval from the database and generating different types of reports using SQL reporting services (SSRS).

Contact this candidate