Data Engineer Big

Location:

Jersey City, NJ

Posted:

March 12, 2025

Contact this candidate

Resume:

Manasa Reddy

Data Engineer

Phone No: +1-551-***-****

Email Id: ***************@*****.***

Background Summary:

* ***** ** ********** ** Big Data processing using technologies like Hadoop, Spark, Hive, and HBase.

Strong skills in data integration, mapping, cleansing, and governance.

Expertise in developing ETL workflows, data profiling, and Master Data Management (MDM).

Experience migrating databases to Hadoop and optimizing Hadoop jobs with MapReduce and HiveQL.

Proficient in Python (NumPy, Pandas, SciPy, Matplotlib, Scikit-learn, PySpark) and Shell scripting.

Developed Spark/Scala and Python scripts for data transformation and analysis.

Extensive use of complex SQL queries, PL/SQL, and T-SQL for database performance optimization.

Hands-on experience with AWS (S3, EMR, Airflow, Athena), Azure (Databricks, DevOps), and GCP.

Implemented CI/CD pipelines with Jenkins, Git, Docker, and Kubernetes for ML model deployment.

Worked with Delta tables in Delta Lake and transformed data in Databricks.

Developed Kafka producers/consumers for streaming high-volume events.

Experience with NoSQL databases (MongoDB, Cassandra, HBase, CouchDB, Redis, DynamoDB).

Configured Zookeeper to support Kafka, Spark, and HDFS.

Tuned Hive, Phoenix/HBase, and Spark jobs for performance improvements.

Automated processes with Shell scripts and optimized Hadoop-related tasks.

Containerized workflows using Docker and managed version control with Azure DevOps/Git.

Technical Skills:

Big Data/Hadoop

Map Reduce, Spark, Spark SQL, Azure, Spark Streaming, Kafka, PySpark, Pig, Hive, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server

Technologies

Languages

HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/RStudio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting

Databases

Microsoft SQL Server, MySQL, Oracle, DB 2, Teradata,

NO SQL Databases

Cassandra, HBase, MongoDB

Web Design Tools

HTML, CSS, JavaScript, JSP, jQuery, XML

Development Tools

Microsoft SQL Studio, IntelliJ, Azure Data bricks, Eclipse, NetBeans.

Cloud

AWS (EC2, IAM, S3, Autoscaling, Cloud Watch, Route53, EMR, Redshift, DynamoDB, Glue), GCP, Azure

Technologies

Development Methodologies

Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools

SQL Loader, PostgreSQL, Aurora, Maven, Apache Ant, RTC, RSA, Control-M, Oozie, Hue, SOAP UI, Jenkins, Toad, Talend

Reporting Tools

MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos.

Operating Systems

All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

Education:

Master’s in Information systems - Pace University-2024

Bachelor’s in Electronics and communication engineering- Bhoj Reddy Engineering College for Women-2019

Work Experience:

Client: Southwest airlines Dallas, Texas Feb 2024 - Present

Role: Data Engineering / ETL Developer

Responsibilities:

Developed ETL processes in AWS Glue using PySpark to migrate and transform data across systems like S3, Redshift, and Adobe.

Designed optimized ETL pipelines for large-scale data ingestion into Databricks Delta Lake.

Automated data capture, organization, and normalization for real-time analytics.

Enhanced eCommerce and real-time data pipelines using Spark, PySpark, and Scala.

Managed AWS cloud systems (EC2, S3, Redshift, Athena, Glue, Lambda, Airflow).

Automated infrastructure with AWS CloudFormation, improving scalability and security.

Migrated MySQL databases to Aurora PostgreSQL and implemented CI/CD pipelines.

Worked with Azure services like Data Factory, SQL DB, Databricks, and GCP for data solutions.

Developed PySpark and Spark SQL scripts for advanced data processing and analysis.

Optimized Hive schema designs and Spark Streaming for real-time data, improving query and processing efficiency.

Designed and implemented stream processing with Kafka, Flink, and Spark Streaming.

Built interactive dashboards in Tableau, Power BI, and SSRS for actionable insights.

Created real-time reporting pipelines using Kafka and PostgreSQL analytics databases.

Utilized tools like Experian Pandora and Aperture Data Studio for data profiling and quality assurance.

Managed NoSQL databases (MongoDB, Cassandra) for high-volume applications.

Leveraged Databricks Delta Lake and Snowflake for efficient querying and analytics.

Conducted advanced data analysis with SQL, Python, and Scala.

Integrated machine learning techniques using Spark MLlib for predictive insights.

Continuously improved technical expertise in PySpark, Databricks, and Snowflake.

Environment: Hadoop, Spark, Hive, Sqoop, HBase, Oozie, Talend, Kafka Azure (HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, SQL DB, SQL DWH, AKS), Scala, Python, Cosmos DB, MS SQL, MongoDB, Ambari, PowerBI, Azure DevOps, Microservices, K-Means, KNN. Ranger, Git

Client: UHG, Addison, TX Mar 2023 – Feb 2024

Role: Cloud Data Engineer / BI Specialist

Responsibilities:

Migrated Oracle databases to BigQuery and utilized Power BI for advanced business intelligence reporting.

Built and optimized ETL pipelines using Airflow on GCP, deploying batch and streaming jobs with Python SDK and Cloud Dataflow.

Developed CI/CD workflows with Ansible to deploy across PostgreSQL clusters on AWS, ensuring high availability and fault tolerance.

Implemented ETL processes with tools like AWS Glue, Informatica, and PySpark on AWS EMR for data integration and transformation.

Designed predictive analytics solutions using Spark MLlib and TensorFlow on Databricks for large-scale datasets.

Created dashboards and reports in Tableau, Power BI, and Databricks notebooks for real-time KPIs and business insights.

Migrated on-premise databases (Oracle, Teradata) to Snowflake and optimized performance with clustering and materialized views.

Designed and managed data pipelines using Hadoop, Snowflake, and BigQuery while ensuring high availability and scalability.

Streamlined real-time data processing using Kafka, Spark Streaming, and Kubernetes for containerized workflows.

Conducted performance tuning of SQL queries and NoSQL databases like MongoDB and Cassandra for efficient data retrieval.

Collaborated with data science teams to deploy ML models in production for predictive and prescriptive analytics.

Enhanced data security and governance by implementing encryption and compliance protocols.

Environment: Spark, Hive, Pig, Spark SQL, Apache Spark, HBase, SQOOP, Kafka, GCP, Cloudera, ETL, Scala, Linux Shell Scripting, HDFS, Python, Snowflake, GCS, QlikView, OpenShift, Hadoop, Teradata

Client: Birla soft, India Dec 2020- July 2022

Role: Data Engineer

Responsibilities:

Imported Legacy data from SQL Server and Teradata into Amazon S3.

Created consumption views on top of metrics to reduce the running time for complex queries.

Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.

Compare the data in a leaf level process from various databases when data transformation or data loading takes place. I need to analyze and investigate the data quality when these types of loads are done (To look for any data loss, data corruption).

As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.

Developed SQL scripts to Upload, Retrieve, Manipulate, and handle sensitive data (National Provider Identifier Data I.e., Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases for the Project

Worked on retrieving the data from FS to S3 using spark commands.

Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.

Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders.

Incorporated predictive modeling (rule engine) to evaluate the Customer/Seller health score using python scripts, performed computations, and integrated with the Tableau viz.

Analyzed click events of Hybrid landing page which includes bounce rate, conversion rate, jump back rate, List/Gallery view, etc. and provide valuable information for landing page optimization.

Evaluated the traffic and performance of Daily deals PLA ads and compared those items with non-daily deal items to see the possibility of increasing ROI. suggested improvements and modify existing BI components (Reports, Stored Procedures)

Understood Business requirements to the core and came up with Test Strategy based on Business rules.

Environment: Snowflake, AWS S3, GitHub, Service Now, HP Service Manager, EMR, Nebula, Teradata, SQL Server, Map Reduce, Python, Pig, Hive, Apache Spark, Sqoop.

Client: Zensar Technologies, India Oct 2019- Dec 2020

Role: Data Engineer/Data Analyst

Responsibilities:

Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine data quality.

Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.

Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.

Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.

Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts.

Handled performance requirements for databases in OLTP and OLAP models.

Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers.

Performed data completeness, correctness, data transformation and data quality testing using SQL.

Involved in designing Business Objects universes and creating reports.

Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.

Prepared complex T-SQL queries, views, and stored procedures to load data into staging area.

Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.

Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.

Environment: SQL, PL/SQL, OLAP, OLTP, UNIX, MS Excel, T-SQL.

Contact this candidate