Data Engineer Processing

Location:

Jersey City, NJ, 07305

Salary:

150000

Posted:

May 11, 2024

Contact this candidate

Resume:

Skills

A. Shaw

Data Engineer Transforming Data Into Insights

Secaucus, 07305 NJ

848-***-****

ad5m6q@r.postjobfree.com

With 9 years of intensive experience in the tech industry, I have had the distinct privilege of driving technological innovation within several startups, playing a pivotal role in conceptualizing and actualizing products from the ground up. My expertise spans a vast array of cutting-edge technologies and systems. In the realm of Programming Languages & Frameworks, I have harnessed advanced SQL techniques, leveraged Python for diverse scripting and ETL processes, and employed Java/Scala, especially in conjunction with platforms like Apache Kafka and Apache Spark. My work in Data Processing Systems is marked by a deep-seated proficiency in Apache Spark, real-time data processing with Apache Flink, and a thorough grasp of the Hadoop ecosystem. Within the sphere of Database & Storage Solutions, I have navigated the intricacies of various RDBMS such as PostgreSQL and MySQL, tapped into the capabilities of NoSQL platforms like MongoDB, and managed columnar and time-series databases with systems like Parquet and InfluxDB. I have an extensive background in Data Warehousing Technologies, familiarizing myself with powerhouse platforms like Amazon Redshift and Snowflake. This experience is complemented by my adeptness in Data Movement & ETL Instruments, having worked extensively with tools like Apache NiFi, Apache Kafka, and more. My skills in Data Design & Architecture have empowered startups to leverage effective data modeling techniques, from Star and Snowflake Schemas to adept normalization/denormalization strategies. This foundation is further reinforced by my expertise in cloud solutions across AWS, Google Cloud, and Azure platforms, as well as Infrastructure Automation Tools, streamlining processes using Terraform, CloudFormation, and Pulumi.

https://www.linkedin.com/in/aejshaw05

Traditional RDBMS like PostgreSQL, MySQL, Oracle, etc NoSQL databases like Cassandra, MongoDB, DynamoDB, etc Columnar storage systems like Parquet and ORC

Time-series databases like InfluxDB or TimescaleDB Solutions like Amazon Redshift, Snowflake, Google BigQuery, and Azure Synapse Analytics

Work History

SQL: Mastery of advanced SQL concepts, window functions, stored procedures, etc

Python: Extensive use for scripting, data manipulation, ETL tasks, et Java/Scala: Especially if they've been working with tools like Apache Kafka or Apache Spark

Apache Spark: Mastery of Spark Core, Spark SQL, and streaming capabilities Apache Flink: For real-time data processing

Hadoop Ecosystem: Deep understanding of MapReduce, Hive, HBase, and other Hadoop technologies

Data Modeling & Architecting

Star Schema and Snowflake Schema for DW modeling

Normalization and denormalization techniques

Data lakes and data lakehouse architectures

S3, EC2, EMR, Glue, Lambda, etc

Google Cloud Platform: BigQuery, Dataflow, Pub/Sub, etc Microsoft Azure: Azure Data Factory, Azure Blob Storage, HDInsight, etc Tools like Terraform, CloudFormation, or Pulumi for infrastructure automation Apache Kafka: For event streaming and real-time analytics Talend, Informatica, Microsoft SSIS, etc

2021-08 - Current Principal Data Integration Engineer Stealth AI Startup (Stich Vision), New Jersey City, New jersey Pioneered a real-time inventory tracking system using Apache Kafka for event streaming, ensuring live updates on stock availability. Developed predictive models in Python to forecast demand and optimize reordering processes, reducing stock-outs by 20%.

Utilized NoSQL databases like MongoDB for managing vast product catalogs, ensuring swift data retrievals and updates. Leveraged Apache Spark for processing and analyzing supply chain data, identifying bottlenecks and areas of improvement.

Designed an integrated data warehouse using Snowflake, consolidating data from various supply chain touchpoints for holistic analytics. Architected a waste tracking system using Java, categorizing and monitoring waste in real-time to streamline recycling processes. Integrated sensors and IoT data with Apache Flink for real-time processing, enabling dynamic route optimization for waste collection trucks. Deployed time-series databases like InfluxDB to track waste generation patterns over time.

Automated infrastructure scaling on AWS using Terraform, ensuring robustness during peak data inflows.

Introduced a data lakehouse architecture, enhancing the flexibility and scalability of waste data storage and analysis.

2018-01 - 2021-01 Data Engineering Team Lead

Horizon Technologies (Addo AI), California, San Francisco Conceptualized and developed a user health data platform, capturing fitness metrics using Apache Kafka streams.

Implemented data lakes on Google Cloud Platform, storing diverse user data like heart rates, step counts, and diet logs.

Leveraged Python scripts for ETL processes, cleaning and transforming wearable device data for analysis.

Employed Apache Spark's machine learning libraries to create personalized workout and diet plans.

Designed a responsive querying system using advanced SQL techniques, providing instant insights into user fitness trends. Optimized CRM databases, primarily using PostgreSQL, ensuring swift data retrieval and efficient storage.

Integrated real-time customer interaction data using Apache Kafka, enhancing the responsiveness of sales and support teams. Employed Star and Snowflake Schemas to structure customer data, facilitating faster report generation.

Streamlined customer segmentation using clustering techniques in Apache Spark, enabling targeted marketing campaigns.

Automated data integration pipelines using Apache NiFi, ensuring timely synchronization of CRM data across platforms.

2015-01 - 2018-01 Data Engineer

Mercurial Minds

Developed a comprehensive document management system, with real-time indexing and retrieval capabilities powered by Apache Kafka and Apache Flink.

Utilized columnar storage systems like Parquet for efficient storage and retrieval of large documents.

Automated document versioning and backup processes on AWS using CloudFormation templates.

Designed a robust search engine using Python, enabling users to quickly find documents based on content, metadata, and tags

Integrated OCR capabilities, transforming scanned documents into searchable and editable formats.

Led the development of a collaborative platform, allowing real-time data sharing and interaction using WebSockets and Apache Kafka. Facilitated seamless integration of third-party tools using Python scripting for ETL tasks.

Leveraged AWS services like Lambda and EC2 for on-demand scalability during peak collaboration hours.

Engineered a data backup system on Azure Blob Storage, ensuring data integrity and availability.

Introduced real-time analytics on collaboration patterns using Apache Spark, providing insights to teams on productivity metrics. Education

BS: Information Technology, Software Engineering - University Of Management & Technology

Contact this candidate