Data Engineer

Location:

Smyrna, GA

Posted:

June 24, 2024

Contact this candidate

Resume:

NAME: Krishna Chaitanya Pulipati

EMAIL: ***************@*****.***

PHONE: +1-470-***-****

LinkedIn: www.linkedin.com/in/krishna-chaitanya-3a3b3c PROFESSIONAL SUMMARY

SENIOR DATA ENGINEER

§ § § § § § § § § § § § § § § § § § § § § § § § Seasoned in Proficient Implemented Extensive Spark/Deep Experience Extensive Experience Developed Proficient Skilled Developed Developed Experienced expertise Deep Skilled Used Hands-Worked Expertise Proficient performance. unstructured ETL diverse Created Also, Skilled Warehouse. using Proficient Dimension DB2, and applying backend Bucketing, tools and utilized Informatica expertise expertise Azure PySpark, in in on in file with custom Agile using Oracle. moving the in such ETL in Data in in experience in experience in Spark Kafka Spark tables. formats in applications in real-data Proficient various Data HIVE in utilizing NoSQL UDFs migration data. frameworks setting data expertise methodologies as manipulating/Engineer and Unix-in machine in Kafka, UDFs time producers applications applications visualization PowerCenter data BTEQ, Factory. deploying data troubleshooting from modeling fine-to Hadoop database based up data with software analyze between in in for and warehouse encompassing of tuning AWS MLOAD, the learning, with with managing Hadoop for processing SQL Pig Command database more. and Piggybank Distributions enterprise-Data seamless using analyzing with technologies: (over such a and using Dimensional and for databases HQL HDFS consumers development passion FASTLOAD, statistical ETL ecosystem Platform Spark-unveil infrastructure, as 8 Hive, and Spark database queries Tableau design, years and with extreme Line a processes. data UDF level variety optimizing large for SQL integrating insights to Relational RDD, such Spark Interface. for of MongoDB, with import/for Repository. Azure analysis, solving and modeling, data access TPT, and on diverse tools tools datasets streaming programming, as Spark-of optimal Databricks Streaming CloudFormation, Relational) “Cloudera, Power tools, EDUCATION into pipelines. including and Data like Spark and including export complex Database Python/and experience SQL, customer Cassandra, Fast migration, including and performance. Lake, JIRA, migrating BI millions programming applications and between Export. and for for creating Hortonworks, finding SCRUM, Play, business with Azure Java Systems Map-effective Dataframe extracting, Kafka. usage Talend, of in and AWS and on-functionality and expertise Reduce, events SQL Software Hadoop patterns tables, and premises HBase. patterns. (development challenges. and Glue, GIT, RDBMS) communication languages Database, Oozie, Test-APIs. transforming, Map per Hive data coupled HDFS, and EMR, in Design and second. Driven Airflow, Reduce, databases into using Star-RDBMS. distribution scripts Redshift, like insights Databricks, Yarn/using with Pig and Schema Apache Development Python, and and of and to Latin MRv2, Development. to extensive insights. MySQL, achieve within Amazon proficiency S3, aggregating Azure through Modeling, Sqoop. and Java, and and Pig, MS Data structured Azure EC2 HQL optimal and (EMR”. experience TDD)Hive, Partitioning SQL inTeradata instances. Lake Proficient Scala data (Fact, SQL HiveQL). Server, HBase, from Store Data with and and in . Sardar Vallabhbhai National Institute of Technology, Surat, India. Bachelor’s in computer science. Grade: 8.06

TECHNICAL SKILLS

Big Data Tools Hadoop, HDFS, Map Reduce, Spark, Airflow, Nifi, HBase, Hive, Pig, Sqoop, Kafka, Oozie, Zookeeper

Operating System Windows, Unix, Sun Solaris

Programming Languages Java, PL/SQL, SQL, Python, Scala, PySpark, C, C++, Go Data Bases Snowflake, MySQL, SQL Server, Oracle 12c/11g, Teradata R15/R14 NoSQL Data Bases DynamoDB, Cassandra, HBase

Methodologies RAD, JAD, System Development Life Cycle (SDLC), Agile Visualization & and ETL tools Tableau, Informatica, Talend, PowerBI. Cloud Platform AWS (Amazon Web Services), Microsoft Azure Cloud Management Amazon Web Services (AWS)- EC2, EMR, S3, Redshift, EMR, Lambda, Athena

OLAP Tools Tableau, SSAS, Business Objects, and Crystal Reports 9 ETL/Data Warehouse Tools Informatica 9.6/9.1, and Tableau Data Modeling Tools Erwin Data Modeler, ER Studio v17 PROFESSIONAL EXPERIENCE

Client: Early Warning, Scottsdale, AZ Oct 2022 – present Role: Senior Data Engineer

Responsibilities:

§ § § § § § § § § § § § § § § § § § § § § § § § § § Developed Utilized Implemented Executed Analyzed Exported Loaded Integrated Used Applied Created Utilized Developed Converted Troubleshooted Installed Designed Used Developed Implemented Deployed Collaborated Gathered Spark/ensuring Linux unstructured enhancing structured (to learning filtering streaming storage seamless data tasks, Accountable innovative various Integrated communication Automated migration and including Reporting enhance aggregations processing Python and and Spark Scala and components Databricks reusable and and Apache SQL use and reliable tables data SQL and IBM AWS near-Spark/Windows of reducing ETL existing data Spark MapReduce Hive/collaboration Services) and CI/GCP Automation processing, Spring and cleaning cases the transformed configured for in Server data. with SQL scripts implemented AWS InfoSphere extraction Airflow CD operations Redshift, real-using Spark from executed and errors tasks, underlying Spark services Python examining Scala SQL Streaming components execution to pipelines under ETL to Hadoop to Lambda operational time construct platforms. interaction Boot of tools for and Teradata data Spark enhance queries to leveraging in and with load the for programs integrating workflows multi-Oracle, scripts. data extract Regressing Spark and data Hbase and designed and Snowflake (using while large including applications BigQuery, data scheduling Python of jobs for a SQL SQL. infrastructure extensive functions Python JSON extraction, efficiency real-transformation DataStage such data a into from node the to building, ML common Shell/processing MongoDB, between Scala overhead. to datasets integrating codebase data HDFS the using efficiency time consume across solutions as workflows. and Spark it projects clusters an data, for SSIS within SQL scripts power from PL/code API, Dataflow, to datasets AWS using MLlib. stream Java. within Hive, Big for testing, transformation, with learner create different SQL using diverse transformations queries (within Pig, created and T-Snowflake SQL to and extensive Data advanced and on the data of S3 for for Developed SQL, Sqoop in processes. program Spark, store Hive, the Hadoop, processing SQL existing the analytics bucket and larger Server implementation Hadoop/serverless scalability and regular Python Dataproc) data from AWS. Analytics databases and within team. systems Cloud queries. Schema and and data uncovering deploying and data model, and machine SQL Kafka data units, using Integration Spark, built microservices MapReduce. expression data Deployed for using and Big ecosystem. in Spark upload using MapReduce integration Server. of data using and ecosystem and for and topics HDFS libraries, ETL big and Data tables RDD, loading. Spark pipelines and Amazon data learning services. Machine seamless processing to Spark, using customer cloud data Spark persisted it process concepts and and Hive storage, interact Services)in pipelines to Streaming, and tasks processing Hive. and database alongside PySpark. Salesforce platforms. using push Kafka, managed Web jobs capabilities. RDDs for learning loaded and data management. in usage validation the efficient solutions, with utilizing Services it for Spring within, Hadoop/and Scala, SSAS into and workflows data applied procedures, processing, Spark, structured, tasks AWS Snowflake daily. patterns applications. it applications, HDFS PySpark, Boot in (and data the Analysis into (Spark AWS) automating by Hive HDFS. EMR across Kafka, Utilized on-Hive, locations. to Databricks developing analytics through Hive the-create to environments clusters and functions, analysis, on semi- databases, while Scala, facilitating multiple Services)fly EC2. facilitate Snowflake executing HIVE Tables, RESTful transformations the and structured, simultaneously the and for and environment, and execution and SQL processing., creation large-facilitating Hive. databases, and executing seamless seamless handling machine on APIs storage, for triggers for scale SSRS both data and the for of of

§ Engaged project alignment in Agile Scrum and adherence processes, to including timelines. sprint planning, daily stand-ups, reviews, and retrospectives, ensuring Skills and Technologies: Hadoop, Spark, Scala, Sqoop, HBase, Hive, Python, PL/SQL AWS, EC2, S3, Lambda, Auto Scaling, Cloud Watch, Cloud Formation, Databricks, IBM InfoSphere, DataStage, MapReduce, Oracle12c, Flat files, MS SQL Server database, XML files, Cassandra, MongoDB, Kafka, MS Access database, Autosys, UNIX, Erwin, Java, AWS Redshift, SSIS, SSAS, SSRS.

Client: Chevron Corporation, Santa Rosa, NM July 2020 – Sep 2022 Role: Senior Data Engineer

Responsibilities:

§ § § § Skills § § § § § § § § § § § § § § § § § § § Created Participated Leveraged Managed Implemented Demonstrated Applied Handled Created Implemented Integrated Developed Designed Utilized Implemented Integrated Designed Converted Leveraged Implemented Implemented Created Responsible Dictionary data cleansing load data reliability and Learning processes existing and Spark configuring, lifetime accessibility visualization Constructed warehouse Development) and archival. coordination data representation. integrity. on Technologies: skills by Azure data features, Jenkins JSON Databricks data Azure Studio) from and process scalable for for IBM business and a 45% AWS Azure Python platform an and of comprehensive in for pipelines. tools in storing SQOOP Google data-REST implemented data ELK scripts Data comprising using proficiency various InfoSphere the Mapping, ETL the Data Numerical and SNS ingesting, or managing to Blob between scenarios, through to solutions organization's scripts related stack governance entire Factory system similar API address challenges triple using Lakes to Cucumber, and Cloud enhance for various to Storage sources, achieve endpoints Hadoop, large deploy for outlining SQS project traditional to user for optimization, projects. (Python in a incorporating transforming, different Data Pub/CI/ADLS) (Proof business ADF) log and mix using technology interact utilizing types for efficient monitoring dataset CD as improved engagement into practices including Gherkin, Pipelines, Sub management a Kafka, reliable step of lifecycle data-part of pipelines and tools for scripts. the Pipeline Azure of Big unstructured components Concepts for ETL with challenges. data data. data transfer definitions Data sources Data of driven real-Data Spark, to Anomaly stack, a and the within messaging and scalability Azure tools components and including and Spark retrieval, Azure visualization automate Lake Transformation with in for time data solutions, Ruby. integrating applications (troubleshooting and Azure between of implemented including and Sqoop, POCs) specific SQL, Analytics, Azure Linked JAR the Blob of Detection, data for storage and analysis, methodologies, the manipulation, integration and and to target Blob the Data BDD using Storage, ingestion Docker, (execute Data structured outlining data Hadoop target HDInsight, Services, tools performance. notification Spark, execution structured and storage, with (Factory and in and Scala, Behavior A/integrating Factory processing a such the categories. processing reference B Swamp, capabilities facilitating the services. a robust and with Scala, and testing, grasp Big MDM Datasets, Spark and formats and as business with (Data of RDBMS. ADF) processing, pipelines services, Databricks. Data data, Pentaho, Driven Hadoop, tests, of data processing, Azure Data Cucumber, enhanced ecosystem. SQL, and it data Factory, integration ecosystem, for strategy with tasks ensuring across and from warehousing. analytical Statistics Model. code MLlib Development) SQL approach data enabling to Tableau, Pipelines Kafka, existing such seamlessly various ensure Data the Data quality capabilities Gherkin, enhancing and processing libraries, its with as entire leveraging to model. and Warehouse. Lake, seamless Roadmap. delivery and to data enhance data compliance to other sources, checks, MDM. Python extract, integrating Ruby, data D3 and and Storage, ingestion, pipelines the through using for Azure to its TDD ecosystem. Developed data communication deployed functionality and Azure, Installing, to automating comprehensive a transform, scalability enhance and scalable SQL (Services. analysis. andMachine deployment Databricks. Test it and extraction, Azure with maintain Activity. Apache Driven a Data data user pre- and and and the HD Insight, Spark SQL, TDD, Spark Streaming, IBM InfoSphere, Hive, Scala, pig, Azure Data Bricks, Azure Data Storage, Azure Data Lake, Azure SQL, NoSQL, Oozie, HBase, Data Lake. Client: McKesson Corporate, Irving, TX Dec 2018 – Jun 2020 Role: Data Engineer

Responsibilities:

§ § Engaged Created Tableau in migrating dashboards data from to Mainframe visualize results, tables employed to HDFS and Python HBase Seaborn tables utilizing libraries Sqoop. for data interpretation

§ § § § § § § § § § § § § § § § § § § § § § Skills Utilized Facilitated Integrated Implemented Conducted Involved Stored Enabled Engaged Transferred Participated Gathered Implemented Integrated Configured Monitored Used during and management instances Tableau. Redshift. for played Sqoop. diverse transformations data seamless ETL and Developed Python. and Utilized and Implemented processing Developed transformations, Collaborated data. and and cross-middleware transformation unstructured dashboards developing processes availability. SCALA Technologies: deployment, a time-sources Leveraged Azure Rest the role in in functional ingestion based data the on-and capabilities a structured RDDs the crafting AWS setup ETL data series in Proof in API for of with SAP the premises import Azure Blob across migration data managed collecting, Hive Hadoop such on to technologies using and Step and storing to and between data, mappings of Oozie PLM Azure track of NoSQL data and transformed optimized Storage across and deployment. access a ingestion aggregations queries as load Concept various Databricks multi-of Functions DataFrames Hadoop, and for ensuring Spark processing. web volume. legacy to shared Clusters. data workflow key of streaming Data the it aggregating, efficient databases various optimize HBase various unstructured tables clustered and into servers, metrics designed into health using to AWS into Factory (Pig Streaming applications Spark, POC) insights enable Azure to the scalability data HBase data for from scripts, to sources HDFS engine for orchestrate services. data relational and target data administrative Talend construct RDBMS, and for Cloudera, environment such big from to for the Data for and seamless RDBMS via analysis. using performance IoT run by ensure data in orchestrating including analytics, to data from database. necessary and SharePoint and transferring as with the Lake employing HDFS device internally Integration and execute data both HBase from the to targets. complex Flume, reliability. Spark analytics system data Software-an Storage Data Hive data sources on and the data the activations Content played AWS exchange input to multiple AWS of HBase, APIs. tables engine, for HBase in model, Sqoop development, utilized data and reliability. data create using data for a efficient Suite and as-EC2 S3 MapReduce such data, a automating from storing Management using workflows, role HDFS, a-shell pipelines for persisting Spark developed independent bucket and machine and Service through tables Spark as to conducting full in servers collaboration. SQOOP, and DB2, interoperability EMR, MapReduce, creating extract large and loads and for to the (way. in using data SaaS) Cloudera the SQL ensuring learning along implemented enhance to on testing volumes loading HBase followed near-Systems and Hive HDFS data workflows, data data Hive a Server, AWS applications, with Hive Flume AWS, real-in client of and transformations Manager, tables, using reliable tasks, CloudWatch, large the of from between HDFS. by (Pig the time platform, CMS) and YARN, structured, Pig creating for speed automated API. Latin implementation Flume. including sets loading leveraging Teradata source jobs incremental execution and into leveraging Hive, ensuring disparate scripts. of of to based visualizations data applied setting semi- data semi-Pig, them Amazon systems, data scaling using to pipelines of Additionally, Spark-Sqoop, on processing. the HDFS REST systems. structured, loads movement with structured multi-up of time essential Spark efficient of Docker S3 alarms based apply data, Oozie, APIs using from EMR with step and and for in Talend, Java,Tableau.

Client: Indian Immunologicals Ltd., India Jan 2017 – Sep 2018 Role: Hadoop Developer

Responsibilities:

§ § § § § § § § § Imported Developed Wrote Involved Created Implemented Loaded Developed Conducted way. reports. Created HBase, Engineered MapReduce Pig, data SSIS data in and batch data and a creating from packages mappings custom exported and Hive. validation jobs a diverse scheduled custom job Hive and File using for data to configuration tables, facilitate extracting System sources, File and between Pig report System cleansing Latin. loading plugin, including the deployments data Oracle plugin files transfer the of from enabling for staged Database data, OLEDB for establishing SQL from Hadoop, and using input Server unmodified source and and writing SSRS records flat enabling HDFS and automated to files, for Hive exporting destination. bidirectionally access before generating into access queries processes a to loading SQL it to files to that files daily, Excel Server through by will into using on weekly, Spreadsheets, Hadoop the run database the SSIS. Data Sqoop. internally Data monthly, MapReduce Platform. Warehouse. using and in SSIS and a vice map programs, packages. quarterly versa. reduce Skills and Technologies: Hadoop, MapReduce, Pig, MS SQL Server, SQL Server Business Intelligence Development Studio, Hive, HBase, SSIS, SSRS, Report Builder, Office, Excel, Flat Files, .NET, T-SQL. Client: Citi Bank, India Sep 2015 – Dec 2016

Role: Data Analyst

Responsibilities:

§ § § § § § § § Skills § Utilized Engaged Utilized Created Executed Conducted Prepared Implemented Ensured Repository and join and validation. and data data conditions a ERWIN Technologies: Informatica the governance warehouses. data in Unit design comprehensive Additionally, Manager. the accuracy ETL loads Test to Data migration documents enhance procedures Document across Repository Modeler standards and participated Informatica integrity environments testing user of for that for to PowerCenter understanding throughout Manager each design processing by outlines of creating in PowerCenter, ETL ACH the and utilizing to mapping, data test transfer the maintain oversee ACH folders QA conditions, of through development procedures the data, DAC Siebel, of outlining and from data. data (folders integrating Data meticulous manage models, scripts, the DAC Warehouse to lifecycle. source between Development verify (Data repositories, and ensuring it data from and target expected/different warehouse Application cleaning target diverse alignment data to including tables, repositories. actual the against and sources Application Console)Production validation with results. fields, source development, into business . transformations,analytical Console)data. procedures. Repository requirements testing,, databases HP using UNIX, and and Windows, Oracle.

Contact this candidate