Aws, Azure, Gcp, SQL, Hadoop, Spark, Pyspark, Hive, Python,Java, Scala

Location:

Chicago, IL

Salary:

60$

Posted:

May 03, 2023

Contact this candidate

Resume:

Vijaya Lakshmi Patti

*********************@*****.***

847-***-****

linkedin.com/in/vijaya-lakshmi-patti-502bb7a2

Data Engineer

PROFESSIONAL SUMMARY

8+ years of professional experience includes experience with AWS and Azure big data technologies.

Extensive experience designing, developing, and implementing robust, highly scalable, fault-tolerant, and automated data/workflow pipelines for processing copious amounts of data.

High understanding of big data and internal structure of YARN.

Extensive experience in optimizing spark and snowflake operations.

Extensive experience using AWS Big Data services to build resource-efficient and cost-effective applications that meet business requirements.

Strong experience in working with AWS CloudFormation template, terraform, boto3, serverless framework for building infrastructure as code to access AWS EC2, S3, Redshift, EMR, Glue, data pipeline, RDS, Athena, data bricks, snowflake, DynamoDB, API gateway, Kinesis, Airflow, Elasticsearch.

Strong experience in using pyspark, sql, Scala, python, and advanced sql for automating simple to overly complicated ETL processes using various triggers.

Experienced in T-SQL, created Stored Procedures, functions, Views, Triggers, dynamic SQL.

Experienced in database build, Git, Azure DevOps, multi-DB update, database backup and restore by Commvault and PowerShell.

Experienced on upgrading SQL Server 2005 /2008 R2 /2012R2 2014, 2016.

Strong experience in working with various data formats (Avro, ORC, JSON, and Parquet) and bucketing/partitioning of data as required.

software engineer experienced in Hadoop and other big data platforms.

Good Knowledge of Spark framework on both batch and real-time data processing Provides clear and effective testing procedures and systems to ensure an efficient and effective process.

Proven experience showcasing technical and operational feasibility of Hadoop developer solutions. Design and build big data architecture for unique projects, ensuring development and delivery of the highest quality, on-time and on budget.

Significant contribution to the development of big data roadmaps.

Able to drive architectural improvement and standardization of the environments.

Experience using Storm for reliable real-time data processing for Enterprise Hadoop. Extend the core functionality of HIVE and PIG with user-specified user-defined functions (UDFs), table-creating user-defined functions (UDTFs), and user-defined aggregate functions (UDAFs) for Hive and Pig.

Create and maintain environment configuration documents for all pre-production environments.

Experience deploying Hadoop clusters using Puppet tools.

TECHNICAL SKILLS:

Programming Languages

Python, Pyspark, SQL, Java, Scala, C

Database Tools

HBase, SQL Server, MySQL, Postgre SQL, MongoDB, Cassandra and HBase

Web Technologies

HTML, XML, CSS

Distributed Compting

HDFS, Hue, Map Reduce, HBase, Hive, Sqoop, Spark, Impala, Cassandra, Oozie, YARN, Flume, Kafka, Zookeeper

Operating Systems

Redhat Linux, Unix, Windows

Cloud Technologies

Aws, Azure, GCP

Search Tools

Apache Lucene, Elasticsearch, Kibana, Apache SOLR, Cloudera

PROFESSIONAL EXPERIENCE

Client: Expedia, Austin, TX 04/2022 - Current

Bigdata developer

Responsibilities:

•Responsible for building Data Lake using various AWS cloud services like S3, EMR, Redshift etc.,

•Worked closely with Business Analysts to gather requirements and design a reliable and scalable data pipeline using AWS EMR.

•Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.

•Experience with implementing DevOps practices and tools such as AWS CodePipeline, CodeDeploy, and Code Commit for automated application deployment and management.

•Hands-on experience in configuring and managing AWS infrastructure using AWS CLI and AWS SDKs in different programming languages.

•Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirement.

•Data pipeline consists of Spark, Hive and Sqoop and custom-built Input Adapters to ingest, transform and analyze operational data.

•Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks

•Experience working in Multi cloud architecture designing, building multiple Data pipelines, end to end ETL from Data ingestion and transformation in GCP and AWS data pipelines.

•Implemented Big Query data processing in Big Query on the GCP Pub/Sub theme, using Python's cloud data streams and using Python's Rest API to load data into Big Query from other systems.

•Engaging in cloud dataflow execution and integrated large-scale queries to validate data between raw source files and large-scale queries, monitoring Dataproc using Stackdriver across environments.

•Snowflake views have been redesigned to improve performance.

•Module test data between Redshift and Snowflake.

•Developed data warehouse models in Snowflake format for over one hundred datasets using Were cape. Reporting in Looker based on Snowflake connection.

•Developed Spark jobs and Hive jobs for summarizing and transforming data.

•Used Spark for interactive querying, streaming data processing, and integration with the DynamoDB NoSQL database.

•Created multiple Spark Databricks jobs to perform cross-table operations using PySpark.

•Developed PySpark and SparkSQL code to process data from Apache Spark on Amazon EMR to perform necessary transformations based on the developed STM.

•Contribute to converting Hive queries to Spark transformations using Spark DataFrames in Scala.

•Developed Kafka producer and Spark streaming applications to build real-time data pipelines.

•Managed importing data from relational databases into S3 using Sqoop and performing transformations using Hive and Spark.

•Helped migrate all Microservices builds to a Docker registry using Jenkins and Pipelines and then deploy to Kubernetes.

•Managed Docker orchestration and containerization using Kubernetes.

•Deploy, scale, and manage Docker containers using Kubernetes.

•Automatic builds using Maven and scheduled automatic nightly builds using Jenkins. You created a Jenkins pipeline that pushes all your microservice builds to a Docker registry and then deploys them to Kubernetes.

•Worked extensively at Jenkins Hudson for continuous integration and end-to-end automation for all builds and deployments.

•Used Jenkins pipelines to drive all micro services builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes.

•Export processed data to redshift using the redshift upload utility for additional visualization and reporting for business intelligence teams.

•Hive was used to shard and analyze the sharded data and calculate various metrics for reports.

•Developed Hive scripts in Hive-QL to demoralize and aggregate data.

•Planned and completed a workflow using AWS Simple Workflow (SWF).

Tools Used: Spark, Hive, DynamoDB, Redshift, snowflake, Scala, Shell Scripting, Amazon EMR, S3, Athena, AWS SWF, Bigquery, Dataproc, Pyspark, Docker, Jenkins, Kubernetes.

Client: Fannie Mae, Reston, VA 02/2021 – 03/2022

Azure Data Engineer

Responsibilities:

•Engage in requirements gathering, design, implementation, deployment, testing, and maintenance of applications to meet organizational needs using the SCRUM methodology.

•Participates in scrum meetings and works with business analysts to understand business requirements and integrate them into functional design. Azure Data Factory has been used extensively to ingest data from disparate source systems.

•Integrate data from upstream systems to downstream using Azure Data Factory as an orchestration tool.

•Automated tasks using various triggers (events, schedules, and rollovers) in ADF.

•Cosmos DB was used to store catalog data and retrieve events from the order processing pipeline.

•Designing and Developing User-Defined Functions, Stored Procedures, and Triggers for Cosmos DB

•Analyzed data flows from various sources to provide an appropriate design architecture in an Azure environment.

•Demonstrates initiative and responsibility for delivering business solutions in a timely manner.

•Created high-level technical design documents and application design documents as required and provided clear, well-defined, and complete design documents.

•Created DA spec and map data flow and provided details to developers with HLD.

•Created Build definition and Release definition for Continuous Integration and Continuous Deployment.

•Created Application Interface Document for the downstream to create a new interface to transfer and receive the files through Azure Data Share.

•Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data bricks. Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.

•Integrated Azure Active Directory authentication to every Cosmos DB request sent and demoed feature to Stakeholders.

•Improved performance by optimizing computing time to process the streaming data and saved cost to the company by optimizing the cluster run time.

•Continuously monitor, automate, and refine data processing solutions to provision complex SQL views, Azure SQL DW, and stored processes at hyperscale.

•Designed and developed a new NRT data processing solution using Azure Streaming Analytics, Azure Event Hub, and Service Bus Queue.

•Created a linked service to transfer data from an SFTP location to Azure Data Lake.

•Create multiple pipelines in Azure using Azure Data Factory v2 to retrieve data from different source systems, each using different Azure operations such as Move & Transform, Copy, Filter, etc. against Databricks. Multi-table to table operations.

•Popular SQL Server data import and export tools.

•Created database user, login, and configuration rights.

•Complex SQL, stored procedures, triggers, and batch operations on large databases on disparate servers.

•Assistance of team members with technical issues, problem resolution, identification of project risks and issues and resolution of management resource issues, individual monthly meetings, weekly meetings.

•Involved in the complete project life cycle starting from design discussion to production deployment.

Tools Used: Azure Cloud, Azure Data Factory, AZURE functions Apps, Azure DataLake, BLOB Storage, SQL server, Teradata Utilities, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, Erwin Data Modelling Tool, Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub.

Client: 7 Eleven. Dallas, TX. 12/2019 – 01/2021

Data Engineer

Responsibilities:

•Use Spark to learn how to improve performance and optimize existing algorithms in Hadoop MapReduce using Spark contexts, Spark-SQL, Dataframes, paired RDDs, and Spark YARN.

•MapReduce and Spark jobs deployed on Amazon Elastic MapReduce using datasets stored in S3.

•Active participation in weekly meetings with technical staff and code review sessions with senior and junior developers.

•Expertise in designing, deploying, and managing highly scalable, fault-tolerant, and secure AWS solutions.

•Proficient in utilizing various AWS services such as EC2, S3, RDS, Lambda, ECS, CloudFront, CloudFormation, and many others.

•Strong knowledge of core AWS services such as IAM, VPC, Route 53, CloudWatch, and CloudTrail for managing and monitoring AWS resources.

•Participate in converting Hive/SQL queries to Spark transformations using Spark RDDs and Scala.

•Developed Spark code using Scala and Spark-SQL/Streaming for faster data processing.

•Designed Kafka producers and consumers for message processing.

•Transferring data to and from an Amazon S3 bucket using the Amazon CLI.

•Running Hadoop/Spark jobs on AWS EMR using programs that are data stored in S3 buckets.

•Knowledge of designing and deploying Hadoop clusters and various big data analytics tools including Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala, and Cassandra.

•Real-time data streaming to Spark with Kafka.

•Prepared Spark builds from MapReduce source code for better performance.

•Data import using Sqoop to load data from MySQL into HDFS regularly.

•Scripts and batch jobs have been developed to schedule various Hadoop programs.

•Data analysis was performed in Hive using Spark API, Hadoop YARN on top of Hortonworks.

•Created Hive queries to analyze data based on business requirements.

•Integrating user data from Cassandra into data from HDFS.

•Participate in converting Hive/SQL queries to Spark transformations using Spark RDD, Scala.

•Participates in real-time data import into Hadoop using Kafka and implements Oozie jobs for daily imports.

•Automatic data retrieval from repositories and web logs by developing workflows and coordinator actions in Oozie.

•Responsible for developing a data pipeline using Flume, Sqoop, and Pig to extract data from network logs and store it in HDFS.

•Experience with batch processing of data sources using Apache Spark and elastic search.

•Spark RDD transformation implementation experience, business intelligence implementation activities.

•Implemented in Spark using Scala and Spark SQL for faster testing and data processing.

•Created Hive tables, contributed data loading and wrote UDF Hives.

•Loaded data into the cluster from dynamically generated files using Flume and relational database management systems using Sqoop.

Tools Used: Hadoop, MapReduce, HDFS, Hive, Java, SQL, CDH5, Pig, Sqoop, Zookeeper, Teradata, PL/SQL, MySQL, HBase, Eclipse.

IMG Solutions Hadoop Developer 01/2017 – 10/2019

Location: India

Responsibilities:

•Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.

•Responsible for building scalable distributed data solutions using Hadoop.

•Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Managing, and reviewing data backups & log files.

•Responsible for managing the test data coming from various sources.

•Analyzed data using Hadoop components Hive and Pig.

•Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.

•Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.

•Involved in loading data from UNIX file system to HDFS.

•Responsible for creating Hive tables, loading data, and writing hive queries.

•Created Hive External tables and loaded the data into tables and query data using HQL.

•Managed importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.

•Created and maintained technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.

•Extracted the data from Teradata into HDFS using Sqoop.

•Exported the patterns analyzed back to Teradata using Sqoop.

•Experience in Monitoring System Metrics and logs for any problems adding, removing, or updating Hadoop Cluster.

•Involved in scheduling Oozie workflow engine to run multiple Hives and pig jobs and used Oozie workflows for batch processing and scheduling workflows dynamically.

•Involved in requirement analysis, design, coding, and implementation phases of the project.

Environment: Hadoop, Spark, Scala, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Zookeeper

Kravy Financials Associate Application Developer 01/2015 – 12/2016

Hyderabad, India

Responsibilities:

•Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.

•Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.

•Applied OOAD principle for the analysis and design of the system.

•Implemented XML Schema as part of XQuery query language.

•Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.

•Used RAD for the Development, Testing and Debugging of the application.

•Used WebSphere Application Server to deploy the build.

•Developed front-end screens using Struts, JSP, HTML, AJAX, jQuery, Java script, JSON, and CSS.

•Used J2EE for the development of business layer services.

•Developed Struts Action Forms, Action classes and performed action mapping using Struts.

•Performed data validation in Struts Form beans and Action Classes.

•Developed POJO based programming model using spring framework.

•Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.

•Used Web Services to connect to mainframe for the validation of the data.

•SOAP has been used as a protocol to send requests and responses in the form of XML messages.

•JDBC framework has been used to connect the application with the Database.

•Used Eclipse for the Development, Testing and Debugging of the application.

•Log4j framework has been used for logging debug, info & error data.

•Used Hibernate framework for Entity Relational Mapping.

•Used Oracle 10g database for data persistence and SQL Developer was used as a database client.

•Extensively worked on Windows and UNIX operating systems.

•Used SecureCRT to transfer files from local system to UNIX system.

•Performed Test Driven Development (TDD) using JUnit.

•Used Ant script for build automation.

•PVCS version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated with Eclipse IDE.

•Used Rational Clear quest for defect logging and issue tracking.

Tools Used: Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, WebSphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON.

Contact this candidate