As a company with unique and rich data, we build on a robust data pipeline to provide data to our machine learning systems. We invest heavily in data collection from real-world scenarios and generate data by conducting massive field experiments with infrastructure operators worldwide. In this position, you will be responsible for a scalable data acquisition, storage, and digestion architecture and deployment. In addition, you will design, build and maintain our on-prem and cloud data stores and analyze data for ML.
We expect you to be knowledgeable and creative when utilizing the appropriate existing solution to design the product. You will need to become familiar with the intricacies of our data acquisition process and its physical properties. You will require experience and familiarity with common data-ops, dev-ops, and cloud services. You will be expected to make design and process decisions based on your research and gained knowledge and to be able to present them and review the team’s work with meaningful input.
- Bachelor’s degree in a quantitative field such as math, computer science, engineering, etc.
- 4+ years of experience and deep understanding of SQL and NoSQL DBs (MySql, Elastic, MongoDB, Postgres)
- 2+ Experienced with Python programming
- Knowledge Data Science related infrastructure, including AWS and GCP
- Familiar with scripting languages including Bash, Windows cmd, Windows PowerShell
- Strong collaborator with teams and peers
- Innovative with a growth mindset
- Additional software programming experience
- Experience with Hadoop, Map Reduce, Spark, or another distributed computing platform
- Cloud provisioning and administration experience and system admin experience in Windows and Linux