Post Job Free
Sign in

Data Engineer with 5 Years of Big Data Expertise

Location:
Plano, TX
Posted:
November 17, 2025

Contact this candidate

Resume:

MOHAMMED OWAIS QURESHI

DATA ENGINEER CONTACT DETAILS:

PHONE: 214- 937- 1092

EMAIL: ******************@*****.***

CAREER OBJECTIVE

To leverage my about 5 years of experience as a Data Engineer to contribute effectively to a dynamic organization where I can apply my expertise in designing, developing, and maintaining scalable data infrastructure and solutions. I aim to collaborate with cross-functional teams to optimize data processes, enhance data quality, and drive insights that enable informed decision-making, ultimately driving business growth and innovation.

PROFILE SUMMARY

●About 5 years of IT experience in Analysis, Design, Development in Big Data technologies like Spark, MapReduce, Hive, Yarn and HDFS including programming languages like Python, Scala, and Java.

●Practical knowledge in using Sqoop to import and export data from Relational Database Systems to HDFS, and vice versa

●Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Data Frame, Pair RDD's and Spark YARN. ETL pipelines in and out of data warehouse using Python with Pandas and Spark.

●Worked on production support looking into logs, hot fixes and used Splunk for log monitoring along with AWS CloudWatch. ● Extensive hands-on experience in using distributed computing architectures such as AWS products (e.g. EC2, Redshift) ● Has experience with story grooming, Sprint planning, daily stand-ups, and software techniques like Agile.

●Design and Develop ETL Processes in Azure data factory to migrate data from external sources like S3, Text Files into Azure synapse. Experience working with both Streaming and Batch data processing using multiple technologies.

●Expert in creating various Kafka producers and consumers for seamless data streaming with AWS services.

●Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML. Strong expertise developing complicated Oracle queries and Database architecture utilizing PL/SQL to Construct Stored Procedures, Functions, and Triggers.

●Developed and implemented automated deployment scripts for WebSphere applications, reducing deployment times and minimizing manual errors in production.

●Experienced in building Snow Pipes, migrating Teradata objects into Snowflake environment.

●In-depth knowledge of Data Sharing in Snowflake and experienced in Snowflake Database, Schema and Table structures.

●Experience in Windows Azure Services like PaaS, IaaS and worked on storages like Blob (Page and Block), SQL Azure. Well experienced in deployment & configuration management and Virtualization.

●Deployed Dockers Engines in Virtualized Platforms for containerization of multiple apps.

●Good experience working with PyTest, PyMock, Selenium web driver frameworks for testing different front end and backend components. Proficient in building CI/CD pipelines in Jenkins using pipeline syntax and groovy libraries.

EDUCATION

● Master of Science in Information Studies from Trine University, Michigan, Detroit.

TECHNICAL SKILLS

Cloud Services: Azure Cloud Services, AWS Cloud Services, Azure VMs, Azure Data Factory, AzureDatabricks, Azure SQL Database, Azure Functions, Logic Apps, App Services,

Azure KeyVaults, Managed Identities, AWS EC2, S3, RDS, Crawlers, IAM, VPC, AWS Glue, Managed Apache Airflow.

Data Warehouses:

Azure Synapse Analytics, AWS Redshift, Snowflake &Salesforce NPSP

Databases:

SQL Server 2018, PostgreSQL, AzureSQL

CI/CD:

Jenkins, Azure DevOps

Source/ Version Control:

Git, GitHub

Project Management Tools:

JIRA, ServiceNow, Confluence

Build Tools:

Jenkins

Programming:

Python Programming

Big Data Technologies:

Apache Spark, Hadoop, Hive

Containerization/ Orchestration:

Docker, Kubernetes

Streaming Data Technologies:

Apache Kafka, Apache Flink

Data Quality/ Governance:

Data Quality Frameworks, Data Governance Practices

WORK EXPERIENCE

Client: Salesforce, Dallas, USA (June 2024 - Present) Role: Azure Data Engineer

Description: Salesforce is an American cloud-based software company headquartered in San Francisco, California. It provides applications focused on sales, customer service, marketing automation, e-commerce, analytics, artificial intelligence, and application development.

Responsibilities:

●Used Django evolution and manual SQL modifications were able to modify Django models while retaining all data, while site was in production mode.

●Developing data pipelines and workflows using Azure Databricks to process and transform large volumes of data, utilizing programming languages such as Python, Scala, or SQL.

●Used Python based GUI components for the Front-End functionality such as selection criteria.

●Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo.

●Used Azure Data factory to ingest data from log files and business custom applications, processed data on Data bricks per day-to-day requirements, and loaded them to Azure Data Lakes.

●Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming and did data quality checks using Spark Streaming and arranged bad and passable flags on the data.

●Used Continuous Delivery Pipeline. Deployed microservices, including provisioning Azure environments and developed modules using Python scripting and Shell Scripting.

●Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity and creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.

●Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods.

●Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy.

●Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible.

●Developed tools using Python, Shell scripting, XML to automate tasks.

●Designed and developed ETL pipelines for real-time data integration and transformation using Kubernetes and Docker.

●Integrated Fivetran HVA with cloud storage solution (Azure Blob Storage) for scalable data archiving and processing.

●Zookeeper was utilized to manage synchronization, serialization, and coordination throughout the cluster after migrating from JMS Solace to Kinesis.

●Implemented data transformations and enrichment using Apache Spark Streaming to clean and structure the data for analysis.

●Utilized Elasticsearch and Kibana for indexing and visualizing the real-time analytics results, enabling stakeholders to gain actionable insights quickly.

●Leveraged Fivetran HVR for seamless data movement across hybrid and multi-cloud environments.

●Integrated DBT Cloud with Azure Blob Storage for efficient data processing workflows.

●Converted SAS codes to Python for predictive models with Pandas, NumPy and scikit-learn.

●Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs. Environment: Databricks, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache

Spark, Azure ADF V2, ADLS, Spark SQL, Python/Scala, Ansible Scripts, Kubernetes, Docker, Jenkins, Azure SQL DW(Synopsis), Azure SQL DB

Client: Microsoft, Hyderabad, India (Aug 2021 - June 2023) Role: AWS Data Engineer

Description: The Microsoft India Development Center (MSIDC) is Microsoft's largest software development center outside of

headquarters in Redmond, Washington. The MSIDC teams focus on strategic and IP sensitive software product development.

Responsibilities:

●Wrote AWS lambda functions in python which invokes python scripts to read the logs from different servers.

●Hands On experience in data analytics services such as Athena and Glue.

●Utilized CI/CD and automated testing for continuous integration and delivery.

●Created batch and real time pipelines using Spark as the main processing framework. Integrated Kafka with Spark Streaming for real time data processing. Work closely with business, transforming business requirements to technical requirements.

●Integrated DBT Cloud with Amazon S3 for efficient data processing workflows.

●Leveraged Fivetran HVA to implement low-latency, real-time data pipelines between on-premise systems and cloud platforms like AWS.

●Hands on experience in working with AWS Cloud Services like EMR, S3 and Redshift. Been part of Design Reviews &Daily Project Scrums. Successfully loaded files to S3 from Teradata and loaded from S3 to Redshift.

●Designed and implemented robust data pipelines using Fivetran HVR to integrate on-premise and cloud data platforms (AWS).

●Created REST APIs for game history which provides users access to their previous game related information.

●Responsible for designing and developing back end of the applications using Python and Flask.

●Monitored and managed DBT Cloud transformation jobs using AWS CloudWatch.

●Been part of developing ETL jobs for extracting data from multiple tables and loading it into data mart in Redshift.

●Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers using Agile/scrum methodologies.

Environment: Python, Flask, S3, Redshift, Athena, Glue, Lambda, EC2.

Client: Aurobindo Pharma, Hyderabad, India (Jan 2020 - June 2021) Role: Data Engineer

Description: Aurobindo Pharma Limited is an Indian multinational pharmaceutical company. Raw data from bioanalytical experiments often require cleaning and transformation before they can be used for analysis. I developed scripts or pipelines to clean, pre-process, and transform data into a usable format, ensuring data quality and consistency.

Responsibilities:

●Developing scalable and reusable database processes and integrating them. Implemented automated Data pipelines for Data migration, ensuring a smooth and reliable transition to the Cloud environment.

●Supported development of Web portals, completed Database Modelling in PostgreSQL, front end support in HTML/CSS, jQuery. Extensively involved in all phases of Data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver business needs of different teams.

●Working knowledge on Kubernetes to deploy scale, Load balance, and manage Docker containers and Open Shift with multiple namespace versions. Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.

●Installing and automation of application using configuration management tools Puppet and Chef.

●Responsible for estimating cluster size, monitoring, and troubleshooting the Spark Databricks cluster.

●Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linea Regression, Logistic Worked on Kafka publishing the messages for further downstream systems.

●Developed Python Spark modules for Data ingestion & analytics loading from Parquet, Avro, JSON data and from database tables. Scripted simulation hardware for testing using Simi’s simulator.

●Developed and implemented data acquisition of Jobs using Scala that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pyspark, Flume, ETL, AWS, Oozie, Sqoop, Oracle, PIG, Eclipse, MySQL



Contact this candidate