Data Engineer Scientist

Location:

Alexandria, VA, 22314

Posted:

February 07, 2025

Contact this candidate

Resume:

Data Scientist

Nosheen Kashif

Email: ****************@*****.*** Phone: +1-571-***-****

Professional Summary:

Data Engineer with 5+ years of experience in Analysis, Design, Development, and Big Data environments, including Hadoop, Scala, PySpark, and HDFS, with additional expertise in Python.

Proficient in Python and cloud platforms (Azure, AWS), with a strong foundation in statistical analysis, data visualization, and NLP.

Experienced in building and optimizing machine learning models, automating data pipelines, and extracting actionable insights from complex datasets.

Hands-on experience in transforming complex datasets into actionable strategies that enhance decision- making and business outcomes.

Implemented monitoring and alerting solutions using Azure Monitor and other third-party tools, proactively detecting and resolving issues in the data processing pipeline.

Proficient in scripting languages including Python, Bash, and R, with experience in developing custom data processing solutions and automating data workflows.

Provided support to data analysts in running Hive queries and building ETL processes.

Defined user stories and drove the agile board in JIRA during project execution, participated in sprint demos and retrospectives.

Collaborated with cross-functional teams, including product managers, designers, and marketing teams, to define A/B testing objectives and success criteria.

Expertise in data visualization and reporting, creating dashboards and reports using Tableau and PowerBI. Technical Skills:

Programming Languages Python, Java, R, Scala, PL/SQL Big Data Technologies Spark (PySpark, Spark applications), Snowflake, Apache Airflow Database Tools Snowflake, SQL Server, MySQL, PostgreSQL Machine Learning Regression, Classification, Clustering, Time Series Data Analysis & Modeling Pandas, NumPy, Scikit-learn, TensorFlow, Keras, PyTorch, Statistical Analysis, Time Series Forecasting, Regression, Classification, Clustering, Natural Language Processing (NLP) Data Visualization Tools Tableau, Power BI, Matplotlib, Seaborn, Plotly Version Control Systems,

CI/CD

Git, GitHub, GitLab, Jenkins, Docker, DevOps

Data Storage & Management MySQL, PostgreSQL, MongoDB, Amazon S3 Collaboration Jira

Professional Experience:

Client: Consult America, MD Feb 2023 – Present

Title: Data Engineer

Leveraged AWS Lambda for serverless computing, optimizing resource usage and enhancing the scalability of various applications, leading to cost savings and improved performance.

Wrote SQL scripts for data migration and successfully loaded historical data from Teradata SQL to Snowflake, ensuring seamless data transfer and continuity.

Developed custom Language Models (LLMs) and employed open-source models for text summarization

Develop and implement NLP models and algorithms to extract relevant information from unstructured text documents, including text summarization, entity extraction, and sentiment analysis.

Developed reusable objects such as PL/SQL program units, database procedures, and functions, streamlining development processes and maintaining consistency in business rule implementations.

Developed Python-based Spark applications using PySpark API, leveraging Pandas and NumPy for data manipulation and analysis, enhancing data processing capabilities.

Created Tableau dashboards to visualize ETL performance metrics, providing insights for optimization. Environment: TensorFlow, PyTorch, Scikit-learn, Python, Scala, PostgreSQL, MongoDB, Bayesian modeling, Salesforce, SQL, NoSQL, MongoDB, Tableau, ETL, Git Client: Digitech, Pakistan Dec 2017 – Jun 2023

Title: Salesforce Administrator

Manage Salesforce custom application by using automation tools within the Salesforce Developer Console

Developed and maintained Salesforce customizations, including Visualforce pages and Apex classes.

Developed and maintained Salesforce Lightning components

Develop and maintain custom Apex triggers, classes, and Visualforce pages to automate key processes.

Developed and maintained Salesforce data architecture, including custom objects, fields, page layouts, and validation rules.

Design and deploy Salesforce Flow automations for lead conversion, case assignment, and approval processes.

Collaborate with cross-functional teams to integrate third-party applications and external APIs (REST/SOAP)

Developed and implemented data analysis, data collection systems, and other strategies that optimize statistical efficiency and quality. Acquiring data from primary or secondary data sources and maintaining databases.

Applied advanced data modeling techniques in PowerBI, ensuring accurate representation of data relationships, and performed data transformations for enhanced visualization.

Troubleshooted and resolved issues related to Tableau dashboards, data connections, and performance bottlenecks.

Collaborated with data engineers and database administrators to design and optimize data models and data infrastructure to support Tableau reporting needs.

Conducted in-depth data analysis using Excel, leveraging functions like VLOOKUP, HLOOKUP, and pivot tables to derive meaningful insights from large datasets, resulting in an increase in data processing efficiency. Environment: Salesforce, Workflows, Apex, SQL, Tableau, PowerBI, Excel. Client: Freelancer July 2016 – Jan 2019

Title: Junior Data Engineer

Implement and refine prompt engineering techniques for large language models (LLMs) and AI/ML technologies, particularly in core ML algorithms like clustering and community detection.

Ensured high availability and fault tolerance by managing ElasticSearch cluster health and scaling.

Designed and implemented PostgreSQL database schemas and table structures based on normalized data models and relational database principles.

Created interactive and insightful dashboards and reports in Power BI, translating complex data sets into visually compelling insights for data-driven decision-making.

Utilized Python, including pandas and NumPy packages, along with PowerBI to create various data visualizations, while also performing data cleaning, feature scaling, and feature engineering tasks.

Developed machine learning models such as Logistic Regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, and Scikit-learn in Python.

Designed and coordinated with the Data Science team in implementing advanced analytical models in Hadoop Cluster over large datasets, contributing to efficient data workflows.

Implement a diverse array of machine learning models to support strategic planning and decision-making processes, contribute to a significant increase in annual revenue by optimizing pricing strategies, enhancing product offerings, and targeting high-value market segments effectively

Conduct comprehensive market analysis by analyzing internal and external data sources. Develop metrics to understand market dynamics and consumer behavior, guiding the development of optimal pricing strategies aligned with business objectives and maximizing profitability

Design and implement robust pipelines to streamline model development and automate data science projects at scale, significantly enhance efficiency and scalability, facilitating rapid iteration and deployment of advanced analytical solutions

Actively engage in cross-functional teams to successfully deploy data products at a production level, in partnership with data engineers, ML Ops and front-end specialists Environment: Azure, Azure Data Factory, PySpark, Apache Spark, Spark-SQL, Scikit-learn, Pandas, NumPy, PostgreSQL, MySQL, Python, Scala, Power BI, SQL.

Contact this candidate