Post Job Free

Resume

Sign in

Data Engineer Onsite Customer

Location:
Herndon, VA
Salary:
140000
Posted:
April 04, 2023

Contact this candidate

Resume:

Sampath Beesa Sr AWS Data Engineer

SUMMARY

• Having 12+ experience in Analysis, Design, Development, Maintenance and user training of enterprise applications and working in distributed technologies like Spark, Hadoop, Hive and orchestration tools Apache Airflow.

• Hands on experience in Avro, Parquet, ORC files, Dynamic Partitions, bucketing for best practice and performance improvement, worked on different Compression Codecs (GZIP, SNAPPY, BZIP).

• Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.

• Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).

• Extensive experience on importing and exporting data using stream processing platforms like Flume and Apache or confluent Kafka.

• Experience in data workflow scheduler Zookeeper and Control M to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.

• Experience in handling various file formats like AVRO, Sequential, Parquet etc.

• Implemented a 'server less' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets. Created a Lambda Deployment function, and configured it to receive events from your S3 bucket

• Experience in AWS platform and its features including IAM, EC2, EBS, VPC, RDS, Cloud Watch, Cloud Trail, Cloud Formation AWS Configuration, Autoscaling, Cloud Front, S3, SQS, SNS, Lambda and Route53.

• Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.

• Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow.

• Good understanding of cloud configuration in Amazon web services (AWS).

• Experience data processing like collecting, aggregating, moving from various sources using Spark

(PySpark).

• Experience in NoSQL column-oriented database and its integration with Hadoop Cluster.

• Hands on with strong development skills in the area of SQL, UNIX Shell scripting, Linux, Oracle, SQL Server, Perl and Python scripting. In Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, RDDs for Pyspark.

• Hands on experience in setting up workflow using Apache Airflow engine for managing and scheduling Hadoop jobs.

• Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration, UAT and Prod.

• Worked on Agile methodology by adhering to standards and techniques and prepared code documentation, also documented the problems and provided solutions.

• Experience on Level 3 problems like configuration, file shares and troubleshooting.

• Very good communication skills, interpersonal skills and problem-solving skills, explore/adapt to new technologies with ease and a good team member.

• Have the motivation to take independent responsibility and strong work ethic with desire to succeed and make significant contributions to the organization. SKILLS:

Bigdata Tools Big Data Hadoop Ecosystem, Apache Spark, MapReduce, Pyspark, Hive, YARN, Kafka, Airflow, Zookeeper, HBase.

Languages Python, SQL, PL/SQL, Shell scripts, JAVA/J2EE, Scala Cloud Tools AWS Glue, S3, RedShift Spectrum, Kinesis EC2, S3, EMR, Dynamo DB, Data Lake, Athena, AWS Data-Pipeline, AWS Lambda, cloud watch, SNS,SQS and Data Bricks

Frameworks Spring, Spring Boot, Hibernate.

Version Controls SVN and Bitbucket

Databases and Tools Oracle 11g/10g, MySQL, SQL

Modelling Language UML, Design Patterns

Testing Tools Junit, Easy Mock and Cucumber.

Build and Deploy Maven and Jenkins

Professional Experience:

Solution Architect (AWS) OCT 2019 to current

Ericsson, Remote

Responsibilities:

• Strong grasp of SDLC (Software Development Life Cycle), Experience working in SCRUM methodology.

• Installed configured PostgreSQL database software, associated database tools and

• performance monitoring software.

• PITR (Point in Time Recovery), Replication, Setting performance parameters.

• Performed database/infrastructure physical design, database upgrades.

• Performed acceptance testing to verify that database changes perform without adverse consequences.

• Some development experience in Databricks on AWS Cloud Platform

• Estimated disk storage requirements for database software, database files and database administration needs.

• Writing Pyspark scripts for unit testing, analysed the sql scripts using snowflake and optimized PySpark jobs.

• PITR (Point in Time Recovery), Replication, Setting performance parameters.

• Performed database/infrastructure physical design, database upgrades.

• Performed acceptance testing to verify that database changes perform without adverse consequences.

• Distributed Data Processing of big data batches or streaming pipelines.

• Estimated disk storage requirements for database software, database files & database administration needs.

• Worked on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.

• Performed database/infrastructure physical design, database upgrades.

• Estimated disk storage requirements for database software, database files & database.

• Ensure quality control of data through the development and maintenance of Data bricks rules and quality control procedures administration needs.

• Any programming language experience with SQL, Store procedures, Spark/Scala

• Support and manage the Aurora PostgreSQL database environments.

• Worked as RDS /Legacy/Traditional DBA on Oracle/Postgres Server

• Created and maintained various DevOps related tools for the team such as provisioning scripts, deployment tools, and development and staging environments on AWS, Rackspace and Cloud.

• Used the AWS Sage Maker to quickly build, train and deploy the machine learning models.

• Used AWS Lambda to perform data validation, filtering, sorting or other transformations for every data change in HBase table and load the transformed data to another data store.

• Created Airflow Scheduling scripts in Python.

• Developed UNIX scripts using operators such as to extract data from data files to load into HDFS.

• Hands on experience with different programming languages such as Python, PL-SQL.

• Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark

• Prior experience in a support role on admin side of Data bricks on one of Azure/AWS/Google Cloud Platform cloud

• Environment: Hadoop, Spark, Scala, MapReduce, Hive, Sqoop, AWS Data Lake, AWS Data Bricks, Pyspark, Yarn, Unix, SQL.

Infosys India Feb2017 toOct2019

Data Engineer

Responsibilities:

• Demonstrating a strong understanding of project scope, data extraction methodology, design of dependent and profile variables, logic and design of data cleaning, exploratory data analysis and statistical methods

• Developed real-time data pipeline using Spark to ingest customer events/activity data into Hive and Cassandra from Kafka.

• Monitoring, maintaining & troubleshooting the Traditional DB servers/EC2/RDS/PostgreSQL instances.

• Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

• Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.

• Performed spark jobs optimization and performance tuning to improve running time and resources.

• Worked on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.

• Experienced with stream processing systems using PySpark.

• Developed complex and multi-step data pipeline using Spark.

• Developed multiple programs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.

• Installed and configured apache airflow for workflow management and created workflows in python.

• Used IAM to create new accounts, roles and groups and polices and developed critical modules like generating amazon resource numbers and integration points with S3, Dynamo DB, RDS, Lambda and SQS Queue.

• Strong experience in familiarity and hands on experience in data bricks, data factorys, steams

• Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

• Analyzed the SQL scripts and designed it by using PySpark SQL for faster performance.

• Worked on reading and writing multiple data formats like JSON,ORC,Parquet on HDFS using PySpark.

• Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.

• Knowledge in installation, configuration, supporting and managing Hadoop Clusters on Amazon web services (AWS).

• Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS sources.

• Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.

• Implemented Spark DStreams APIs and performed requirement specific transformations and actions in real time and Persisted data into Hive.

• Implemented ETL framework using Spark with Python and loaded standardize data into Hive and Hbase tables.

• Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

• Optimized existing models in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's Environment: Apache Hadoop, Amazon EC2, Amazon S3, Amazon EMR, AWS lambda, Data Lake, HDFS, Hive, Java, Scala, Spark, Cloudera CDH5, Oracle, MySQL, pyspark, Tableau, SFTP. Maxis, Malaysia Sep 2016 to Feb 2017

Consultant

• Work on incident and service tickets for IME Mediation product.

• Daily Health Check of Mediation Nodes and System communication with other nodes.

• Clearing the error records by adding the necessary business configurations and reprocessing.

• Solving the tickets on given period.

• Liaise with L3 Team on the deployment of new business logics.

• Generating the Mediation Reports.

• Writing shell scripts to fetch the usage details for every services level that distributed to downstream systems.

• Identifying the areas in which the tickets has breached SLA and providing necessary KT to the team for a quality of delivery and fastening the ticket closure.

• Participated in client weekly conference calls to update status on outstanding issues and requirements.

• Prepared event tracking sheet on weekly basis for the tickets assigned to Mediation.

• System Monitoring

• Providing scripts to generate reports and fixing for the repetitive issues.

• Support in Life Issue Analysis.

• Testing the changes and deploying the changes on the production servers. Ericsson India Global Service Pvt Ltd, India Feb 2014 to Sep 2016 Senior Solution Integrator

• Liaise with customer to understand their requirements and convert them to use cases to be developed or customized by the team.

• Devise solutions based on client’s requirements and provided suggestions to client based on our domain knowledge.

• Write code based on functional specification and technical design; adhere to development techniques and standards (for development during customization phase in case if we require additional adapters’ development for the client specific solution).

• Adhere to quality control processes and standards to achieve better quality deliverables and help the team to understand them as well.

• Preparation of RCA (root cause analysis) documents for the issues fixed at the onsite customer products.

• Adhere to SLA (Service level agreement) timelines for every customer while fixing issues raised by customers.

• Preparation of POC (Proof of Concept), High Level Design (HLD) documents for the Provisioning products.

• Planning application license capacities against to the customer bases and placing the orders for the same.

• Customizing Provisioning products for new implementations as per client’s requirements.

• Analyzing MML command specification documents received from Network elements vendors for product customization.

• Developed the GMD adapters to send the provisioning commands from BSCS Billing Module

• Perform load balancing, performance testing and fine tuning and product bench marking for the devised solutions to meet customer requirements.

Tecnotree Corporation Ltd, India Jul 2010 to Feb 2014 Software Engineer

• Worked on Mediation, Provisioning and CDR Store product requirements gathering.

• Worked on Mediation, Provisioning and CDR Store product customization and implementation.

• Trained Graduate Engineer Trainees on team’s products and company’s processes.

• Preparation of Impact analysis, Low Level Design (LLD) and solution description documents for the change requests raised for existing customer’s products.

• Presentation of Impact analysis and solution description documents in the change and quality control boards within the organization for approvals.

• Conducted knowledge sharing, onsite experience, client implementation awareness to internal team members.

• Designed and developed pre-parsers for handling unwanted record types and Tag values coming in the RAW CDR from Network elements.

• Designed and developed adapters for Provisioning system for handling telnet interfaces to Network elements.

• Worked on incident and service tickets for Mediation and Provisioning products for all the clients.

• Participated in client weekly conference calls to update status on outstanding issues and requirements.

• Prepared event tracking sheet on weekly basis for the tickets assigned to Mediation and Service provisioning products.



Contact this candidate