Job Description
We are looking for an experienced Data Engineer to join our team in Highlands Ranch, Colorado. In this long-term contract position, you will play a pivotal role in acquiring, processing, and structuring large-scale data from diverse sources in the healthcare domain. Your work will directly contribute to the development of cutting-edge AI solutions designed to support medical professionals and patients.
Responsibilities:
• Develop and deploy automated tools for extracting and downloading data from healthcare systems and websites.
• Optimize large-scale data acquisition processes to ensure efficiency and reliability.
• Store extracted data in structured formats such as databases, cloud storage, or flat files to facilitate analysis.
• Collaborate with stakeholders to identify and prioritize data sources, ensuring data completeness and relevance.
• Implement robust error handling, logging, and retry mechanisms to maintain data integrity during extraction.
• Ensure compliance with data privacy regulations and ethical standards throughout the data acquisition process.
• Utilize authentication techniques including OAuth, session cookies, and tokens to access secured data sources.
• Manage proxy configurations, captcha solving, and user-agent rotation to streamline web scraping activities.
• Parse and clean raw data from various formats such as JSON, HTML, and PDFs for structured integration.• Minimum of 5 years of experience in web scraping or data extraction using tools such as Python (Scrapy, BeautifulSoup, Selenium) or Node.js (Puppeteer).
• Proficiency in working with relational and NoSQL databases like PostgreSQL and MongoDB.
• Hands-on experience with cloud storage systems such as AWS S3 or Microsoft Azure.
• Strong understanding of data privacy laws and ethical standards in data processing.
• Expertise in handling authentication flows including OAuth, session cookies, and tokens.
• Familiarity with proxy management, captcha solving, and user-agent rotation for web scraping.
• Ability to parse and clean data from formats like JSON, HTML, and PDFs.
• Skilled in API development and data visualization using cloud technologies.