Resume

Machine Learning Image Data

Location:

Overland Park, KS

Posted:

October 10, 2023

Contact this candidate

Resume:

Greeshma Reddy

Email: ad0ajg@r.postjobfree.com

Phone: 913-***-****

Sr Data Engineer

Professional Summary

Overall 9 years of IT experience in various industries working on Big Data technology using technologies such as Cloudera and Hortonworks distributions. The Hadoop working environment includes Hadoop, Spark, MapReduce, Kafka, Hive, Ambari, Sqoop, HBase, and Impala.

Experience in programming with Scala, Java, Python, SQL, T - SQL, and R.

Experience in handling image data using python libraries like OpenCV and scikit-image.

Deep understanding of common Machine Learning techniques related to Time series, Regression and PCI.

Experience implementing real-time and batch data pipelines using AWS Services, Lambda, S3, DynamoDB, Kinesis, Redshift, and EMR.

Experience in AWS services such as S3, EC2, Glue, AWS Lambda, Athena, AWS Step Function, and Redshift.

Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, GraphX, Spark SQL, Kafka.

Adept at configuring and installing Hadoop/Spark Ecosystem Components.

Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX, and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala and worked with Spark to improve the efficiency of existing algorithms using SparkContext, Spark SQL, Spark MLlib, Data Frame, Pair RDD, and Spark YARN.

Experience applying various data sources like Oracle SE2, SQL Server, Flat Files, and Unstructured files into a data warehouse.

Able to use Sqoop to migrate data between RDBMS, NoSQL databases, and HDFS.

Experience in Extraction, Transformation, and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating, and moving data from multiple sources using Apache Flume, Kafka, PowerBI, and Microsoft SSIS.

Experience in Web Scraping which is an automatic method to obtain large amounts of data from websites.

Hands-on experience with Hadoop architecture and various components such as Hadoop File System (HDFS), Job Tracker, Task Tracker, Name Node, Data Node, and Hadoop MapReduce programming.

Comprehensive experience developing simple to complex Map reduction and Streaming jobs using Scala and Java for data cleansing, filtering, and data aggregation. Also, possess detailed knowledge of the MapReduce framework.

Used IDEs like Eclipse, IntelliJ IDE, PyCharm IDE, Notepad ++, and Visual Studio for development.

Seasoned practice in Machine Learning algorithms and Predictive Modeling such as Linear Regression, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, KNN, Neural Networks, and K-means Clustering.

Ample knowledge of data architecture, including data ingestion pipeline design, Hadoop/Spark architecture, data modeling, data mining, machine learning, and advanced data processing.

Experience working with NoSQL databases like Cassandra and HBase and developing real-time read/write access to large datasets via HBase.

Developed Spark Applications that can handle data from various RDBMS (MySQL, Oracle Database) and Streaming sources.

Experience analyzing data using HiveQL, Pig, HBase, and custom MapReduce programs in Java 8.

Experience working with GitHub/Git 2.12 source and version control systems.

Experience in privacy and data security laws and regulations, including GDPR, COPPA, and VPPA

Strong in core Java concepts, including Object-Oriented Design (OOD) and Java components like Collections Framework, Exception Handling, and I/O system.

Knowledge of best practices prevalent in data governance, data quality, and data privacy. Familiarity with data management principles and practices.

Deep understanding of Oracle Goldengate mechanism, features such as integrated capture, native DDL capture, and multi-thread replication.

EDUCATION:

Bachelor’s degree in computer science from JNTU 2014.

PROFESSIONAL EXPERIENCE

Amway Corp ADA, MI February 2022 to Present

Role: Sr Data Engineer

Responsibilities:

Experience in building and architecting multiple Data pipelines, end-to-end ETL, and ELT processes for Data ingestion and transformation in AWS.

Used AWS API Gateway and AWS Lambda to get AWS cluster inventory by using AWS Python API

Implemented a Continuous Delivery pipeline with Docker and GitHub

Worked with Lambda function to load Data into Redshift on the arrival of CSV files in an S3 bucket

Devised simple and complex SQL scripts to check and validate Data Flow in various applications.

Performed Data Analysis, Migration, Cleansing, Transformation, Integration, Data Import, and Data Export through Python.

Worked on JSON schema to define tables and column mapping from AWS S3 data to AWS Redshift and used AWS Data Pipeline for configuring data loads from AWS S3 to AWS Redshift

Performed data engineering functions: data extract, transformation, loading, and integration in support of enterprise data infrastructures - data warehouse, operational data stores, and master data management

Responsible for data services and data movement infrastructures. Good experience with ETL concepts, building ETL solutions, and Data modelling.

Architected several DAGs (Directed Acyclic Graph) for automating ETL pipelines

Hands-on experience in architecting the ETL transformation layers and writing spark jobs to do the processing.

Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, and writing applications)

Imported data from AWS S3 into Spark RDD and performed actions/transformations on them.

Created Partitions, Bucketing, and Indexing for optimization as part of Hive data modelling.

Involved in developing Hive DDLs to create, alter, and drop Hive tables.

Worked on different RDDs to transform the data coming from other data sources and transform data into required formats.

Hands-on experience in web scraping which consists of crawler and scraper. To extract all the data on sites and specify the data.

Created data frames in SPARK SQL from data in HDFS and performed transformations, analyzed the data, and stored the data in HDFS.

Deployed various microservices such as Cassandra, Spark, and MongoDB in Kubernetes and Hadoop clusters using Docker.

Worked with Spark Core, Spark Streaming, and Spark SQL modules of Spark for faster processing of data.

Developed Spark code and SQL for faster testing and processing of real-time data.

Worked on Talend ETL to load data from various sources to Data Lake.

Used Amazon DynamoDB to gather and track the event-based metrics.

Used Amazon Elastic Beanstalk with Amazon EC2 to deploy the project in AWS.

Used Spark for interactive queries, processing of streaming data, and integrating with popular NoSQL databases for a massive volume of data.

Consumed the data from Kafka queue using Spark.

Involved in regular standup meetings, status calls, and Business owner meetings with stakeholders.

Environment: Spark, Python, AWS, S3, Glue, Redshift, DynamoDB, Hive, Spark SQL, Docker, Kubernetes, Airflow, ETL workflows.

Chewy Dania Beach, FL September 2019 to January 2022

Role: Data Engineer

Responsibilities

Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines

Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines

Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores, and master data management

Implemented Copy activity, Custom Azure Data Factory Pipeline Activities

Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.

Architect & implement medium to large-scale BI solutions on Azure using Azure Data Platform services(Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2) & Azure, self-hosted Integration runtime.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight Cluster

Responsible for data services and data movement infrastructures

Experienced in ETL concepts, building ETL solutions, and Data modeling

Worked on architecting the ETL transformation layers and writing spark jobs to do the processing.

Aggregated daily sales team updates to send executives a report and organize jobs running on Spark clusters.

Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the snowflake

Loaded application analytics data into a data warehouse at regular intervals of time

Designed & build infrastructure for the Google Cloud environment from scratch

Implemented Dimensional modeling (Star schema, Snowflake schema), transactional modeling, and SCD(Slowly changing dimension)

Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines

Worked on confluence and Jira

Designed and implemented a configurable data delivery pipeline for scheduled updates to customer-facing data stores built with Python

Compiled data from various sources to perform complex analysis for actionable results.

Measured Efficiency of Hadoop/Hive environment, ensuring SLA is met.

Optimized the TensorFlow Model for efficiency

Analyzed the system for new enhancements/functionalities and performed Impact analysis of the application for implementing ETL changes

Implemented a Continuous Delivery pipeline with Docker, and GitHub

Built performant, scalable ETL processes to load, cleanse and validate data

Participated in the entire software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies

Collaborate with team members and stakeholders in the design and development of the data environment

Preparing associated documentation for specifications, requirements, and testing.

Environment: Azure, Azure Data Factory, Lambda Architecture, Stream Analytics, Snowflake, MySQL, SQL Server, Python, Scala, Spark, Hive, Spark -SQL, Pandas, NumPy

Empower Retirement, Greenwood Village, CO May 2017 to August 2019

Role: Data Engineer

Responsibilities

Worked extensively on performance tuning of Spark jobs and Hive queries.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Python.

Created Hive tables as per the requirement as managed or external tables, intended for efficiency.

Proactively and continuously drive system-wide quality improvements by undertaking thorough root cause analysis for significant incidents with component engineering teams.

Implemented Schema extraction for parquet and Avro file formats in Hive.

Used Spark-SQL to Load JSON data, create Schema RDD, load it into Hive tables, and handle structure data using Spark SQL.

Hands of experience in Google cloud platform- GCP, Google Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/sub cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver

Built data pipelines in airflow in GCP for ETL-related jobs using different airflow operations.

Used shell SDK in GCP to configure services like Data Proc, Big Query, and storage.

Implement Hive UDFs for data evaluation, filtering, loading, and storing.

Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.

Worked on developing a Pyspark script to encrypt the raw data by using hashing algorithms concepts on client-specified columns.

Responsible for the database's design, Development, and testing and Developed Stored Procedures, Views, and Triggers.

Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.

Compiling and validating data from all departments and presenting it to the Director of Operation.

KPI calculator Sheet and maintain that sheet within SharePoint.

Created reports with complex calculations and worked on Ad-hoc reporting using PowerBI.

Creating a data model that correlates all the metrics and gives a valuable output.

Worked on tuning SQL Queries to reduce run time by working on Indexes and Execution Plan.

Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and uploading into the Data warehouse servers.

Design, develop, and test dimensional data models using the Kimball method's Star and Snowflake schema methodologies.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python, and Scala.

Ensure deliverables (Daily, Weekly & Monthly MIS Reports) are prepared to satisfy the project requirements cost and schedule.

Worked on a direct query using PowerBI to compare legacy data with the current data and generated reports and stored dashboards.

Designed SSIS Packages to extract, transfer, and load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)

SQL Server reporting services (SSRS). Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP, Sub reports, ad-hoc reports, parameterized reports, interactive reports & custom reports.

Created action filters, parameters, and calculated sets for preparing dashboards and worksheets using PowerBI.

Used ETL to implement the Slowly Changing Transformation to maintain Historically Data in the Data warehouse.

Perform ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and uploading it into the Data warehouse servers.

Created dashboards for analyzing POS data using Power BI.

Environment: Spark, Python, ETL, Power BI, Hive, Power BI, GCP, Big Query, DataProc Data Pipeline, IBM Cognos 10.1, Data Stage, Cognos Report Studio 10.1, Cognos 8 & 10 BI, Cognos Connection, Cognos office Connection, Cognos 8.2/3/4, Data stage and Quality Stage 7.5, MS SQL Server 2016, T-SQL, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS), Management Studio (SSMS), Advance Excel

Dhruvsoft Services Private Limited, Hyderabad, India November 2015 to February 2017

Role: Data Analyst

Responsibilities

Handled importing of data from various sources. Loading data into HDFS and importing the data from MYSQL into HDFS using Sqoop.

Involved in loading data from the edge node to HDFS using shell scripting.

Hands-on experience in architecting the ETL transformation layers and writing spark jobs to do the processing.

Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, and writing applications)

Imported data from AWS S3 into Spark RDD and performed actions/transformations on them.

Created Partitions, Bucketing, and Indexing for optimization as part of Hive data modeling.

Involved in developing Hive DDLs to create, alter, and drop Hive tables.

Worked on different RDDs to transform the data coming from other data sources and transform data into required formats.

Involved in analyzing system failures, identifying root causes, and recommending course of action.

Managing and Scheduling jobs on Hadoop Cluster using Oozie workflows.

Used Spark stream processing to get data into in-memory, implements RDD transformations, and actions to process as units.

Created and worked with Sqoop jobs with total load to populate Hive External tables.

Worked extensively on Hive to create, alter, and drop tables and was involved in writing hive queries.

Created and altered HBase tables on top of data in the data lake.

Wrote Shell scripts to automate Hive or Sqoop jobs to fetch, transform or query the data.

Worked on performance tuning for the HIVE queries.

Collected the log data from web servers and integrated it into HDFS using Flume.

Environment: Hive, Sqoop, Spark, HDFS, MySQL, Oozie, Flume, HBase, Python, YARN, Cloudera, RDDs, AWS, Spark.

Yana Software Private Limited Hyderabad, India June 2014 to October 2015

Role: Data Analyst

Responsibilities

Devised simple and complex SQL scripts to check and validate Dataflow in various applications.

Performed Data Analysis, Migration, Cleansing, Transformation, Integration, Data Import, and Data Export through Python.

Devised PL/SQL Stored Procedures, Functions, Triggers, Views, and packages. Made use of Indexing, Aggregation, and Materialized views to optimize query performance.

Developed logistic regression models (using R programming and Python) to predict subscription response rate based on customer variables like past transactions, response to initial mailings, promotions, demographics, interests, hobbies, etc.

Created Tableau dashboards/reports for data visualization, Reporting, and Analysis and presented them to Business.

Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of data analysis responsibilities.

Interacted with the database administrators and business analysts for data type and class words.

Conducted design sessions with business analysts and ETL developers to come up with a design that satisfies the organization's requirements.

Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards.

Knowledge in Tableau Administration Tool for Configuration, adding users, managing licenses and data connections, scheduling tasks, and embedding views by integrating with other platforms.

Worked with senior management to plan, define and clarify dashboard goals, objectives, and requirements.

Responsible for daily communications to management and internal organizations regarding the status of all assigned projects and tasks.

Environment: SQL, Tableau, R, Python, Excel, Lookups, Dataflow, ETL.

Contact this candidate