Manasa Reddy
Data Engineer
Phone No: +1-551-***-****
Email Id: ***************@*****.***
Background Summary:
* ***** ** ********** ** Big Data processing using technologies like Hadoop, Spark, Hive, and HBase.
Strong skills in data integration, mapping, cleansing, and governance.
Expertise in developing ETL workflows, data profiling, and Master Data Management (MDM).
Experience migrating databases to Hadoop and optimizing Hadoop jobs with MapReduce and HiveQL.
Proficient in Python (NumPy, Pandas, SciPy, Matplotlib, Scikit-learn, PySpark) and Shell scripting.
Developed Spark/Scala and Python scripts for data transformation and analysis.
Extensive use of complex SQL queries, PL/SQL, and T-SQL for database performance optimization.
Hands-on experience with AWS (S3, EMR, Airflow, Athena), Azure (Databricks, DevOps), and GCP.
Implemented CI/CD pipelines with Jenkins, Git, Docker, and Kubernetes for ML model deployment.
Worked with Delta tables in Delta Lake and transformed data in Databricks.
Developed Kafka producers/consumers for streaming high-volume events.
Experience with NoSQL databases (MongoDB, Cassandra, HBase, CouchDB, Redis, DynamoDB).
Configured Zookeeper to support Kafka, Spark, and HDFS.
Tuned Hive, Phoenix/HBase, and Spark jobs for performance improvements.
Automated processes with Shell scripts and optimized Hadoop-related tasks.
Containerized workflows using Docker and managed version control with Azure DevOps/Git.
Technical Skills:
Big Data/Hadoop
Map Reduce, Spark, Spark SQL, Azure, Spark Streaming, Kafka, PySpark, Pig, Hive, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server
Technologies
Languages
HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/RStudio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting
Databases
Microsoft SQL Server, MySQL, Oracle, DB 2, Teradata,
NO SQL Databases
Cassandra, HBase, MongoDB
Web Design Tools
HTML, CSS, JavaScript, JSP, jQuery, XML
Development Tools
Microsoft SQL Studio, IntelliJ, Azure Data bricks, Eclipse, NetBeans.
Cloud
AWS (EC2, IAM, S3, Autoscaling, Cloud Watch, Route53, EMR, Redshift, DynamoDB, Glue), GCP, Azure
Technologies
Development Methodologies
Agile/Scrum, UML, Design Patterns, Waterfall
Build Tools
SQL Loader, PostgreSQL, Aurora, Maven, Apache Ant, RTC, RSA, Control-M, Oozie, Hue, SOAP UI, Jenkins, Toad, Talend
Reporting Tools
MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos.
Operating Systems
All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris
Education:
Master’s in Information systems - Pace University-2024
Bachelor’s in Electronics and communication engineering- Bhoj Reddy Engineering College for Women-2019
Work Experience:
Client: Southwest airlines Dallas, Texas Feb 2024 - Present
Role: Data Engineering / ETL Developer
Responsibilities:
Developed ETL processes in AWS Glue using PySpark to migrate and transform data across systems like S3, Redshift, and Adobe.
Designed optimized ETL pipelines for large-scale data ingestion into Databricks Delta Lake.
Automated data capture, organization, and normalization for real-time analytics.
Enhanced eCommerce and real-time data pipelines using Spark, PySpark, and Scala.
Managed AWS cloud systems (EC2, S3, Redshift, Athena, Glue, Lambda, Airflow).
Automated infrastructure with AWS CloudFormation, improving scalability and security.
Migrated MySQL databases to Aurora PostgreSQL and implemented CI/CD pipelines.
Worked with Azure services like Data Factory, SQL DB, Databricks, and GCP for data solutions.
Developed PySpark and Spark SQL scripts for advanced data processing and analysis.
Optimized Hive schema designs and Spark Streaming for real-time data, improving query and processing efficiency.
Designed and implemented stream processing with Kafka, Flink, and Spark Streaming.
Built interactive dashboards in Tableau, Power BI, and SSRS for actionable insights.
Created real-time reporting pipelines using Kafka and PostgreSQL analytics databases.
Utilized tools like Experian Pandora and Aperture Data Studio for data profiling and quality assurance.
Managed NoSQL databases (MongoDB, Cassandra) for high-volume applications.
Leveraged Databricks Delta Lake and Snowflake for efficient querying and analytics.
Conducted advanced data analysis with SQL, Python, and Scala.
Integrated machine learning techniques using Spark MLlib for predictive insights.
Continuously improved technical expertise in PySpark, Databricks, and Snowflake.
Environment: Hadoop, Spark, Hive, Sqoop, HBase, Oozie, Talend, Kafka Azure (HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, SQL DB, SQL DWH, AKS), Scala, Python, Cosmos DB, MS SQL, MongoDB, Ambari, PowerBI, Azure DevOps, Microservices, K-Means, KNN. Ranger, Git
Client: UHG, Addison, TX Mar 2023 – Feb 2024
Role: Cloud Data Engineer / BI Specialist
Responsibilities:
Migrated Oracle databases to BigQuery and utilized Power BI for advanced business intelligence reporting.
Built and optimized ETL pipelines using Airflow on GCP, deploying batch and streaming jobs with Python SDK and Cloud Dataflow.
Developed CI/CD workflows with Ansible to deploy across PostgreSQL clusters on AWS, ensuring high availability and fault tolerance.
Implemented ETL processes with tools like AWS Glue, Informatica, and PySpark on AWS EMR for data integration and transformation.
Designed predictive analytics solutions using Spark MLlib and TensorFlow on Databricks for large-scale datasets.
Created dashboards and reports in Tableau, Power BI, and Databricks notebooks for real-time KPIs and business insights.
Migrated on-premise databases (Oracle, Teradata) to Snowflake and optimized performance with clustering and materialized views.
Designed and managed data pipelines using Hadoop, Snowflake, and BigQuery while ensuring high availability and scalability.
Streamlined real-time data processing using Kafka, Spark Streaming, and Kubernetes for containerized workflows.
Conducted performance tuning of SQL queries and NoSQL databases like MongoDB and Cassandra for efficient data retrieval.
Collaborated with data science teams to deploy ML models in production for predictive and prescriptive analytics.
Enhanced data security and governance by implementing encryption and compliance protocols.
Environment: Spark, Hive, Pig, Spark SQL, Apache Spark, HBase, SQOOP, Kafka, GCP, Cloudera, ETL, Scala, Linux Shell Scripting, HDFS, Python, Snowflake, GCS, QlikView, OpenShift, Hadoop, Teradata
Client: Birla soft, India Dec 2020- July 2022
Role: Data Engineer
Responsibilities:
Imported Legacy data from SQL Server and Teradata into Amazon S3.
Created consumption views on top of metrics to reduce the running time for complex queries.
Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.
Compare the data in a leaf level process from various databases when data transformation or data loading takes place. I need to analyze and investigate the data quality when these types of loads are done (To look for any data loss, data corruption).
As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.
Developed SQL scripts to Upload, Retrieve, Manipulate, and handle sensitive data (National Provider Identifier Data I.e., Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases for the Project
Worked on retrieving the data from FS to S3 using spark commands.
Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders.
Incorporated predictive modeling (rule engine) to evaluate the Customer/Seller health score using python scripts, performed computations, and integrated with the Tableau viz.
Analyzed click events of Hybrid landing page which includes bounce rate, conversion rate, jump back rate, List/Gallery view, etc. and provide valuable information for landing page optimization.
Evaluated the traffic and performance of Daily deals PLA ads and compared those items with non-daily deal items to see the possibility of increasing ROI. suggested improvements and modify existing BI components (Reports, Stored Procedures)
Understood Business requirements to the core and came up with Test Strategy based on Business rules.
Environment: Snowflake, AWS S3, GitHub, Service Now, HP Service Manager, EMR, Nebula, Teradata, SQL Server, Map Reduce, Python, Pig, Hive, Apache Spark, Sqoop.
Client: Zensar Technologies, India Oct 2019- Dec 2020
Role: Data Engineer/Data Analyst
Responsibilities:
Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine data quality.
Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.
Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.
Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts.
Handled performance requirements for databases in OLTP and OLAP models.
Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers.
Performed data completeness, correctness, data transformation and data quality testing using SQL.
Involved in designing Business Objects universes and creating reports.
Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
Prepared complex T-SQL queries, views, and stored procedures to load data into staging area.
Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
Environment: SQL, PL/SQL, OLAP, OLTP, UNIX, MS Excel, T-SQL.