Data is the foundation of everything we do. As a senior data engineer you’ll have authority (and responsibility) of designing and maintaining infrastructure for collecting, storing, processing, and analyzing terabyte-scale sets of data including large document corpora, machine learning results, metadata, and application data.
We are looking for a Data Engineer to collect, manage, and deploy massive sets of global patent data and more. Your role will be to ensure that our dataset of over 100 million patents is readily available for our web application and data science teams. You’ll also be responsible for identifying and integrating additional data sets which allow us to expand our product features and AI capabilities.
This is an exciting opportunity to engage with cutting-edge technology and work on a real-world problem at global scale. In addition to competitive compensation and benefits there is also room for the right person to take on increased responsibilities. And it’s a lot of fun (although fast-paced and even chaotic at times) working as part of a small, passionate team.
- Take ownership of understanding, acquiring, and managing innovation and technology related datasets starting with global patents
- Write and automate pipelines for data cleansing, ingestion of machine learning results, ingestion of raw data from multiple sources, aggregation and more
- Architect and manage data infrastructure to optimize for machine learning, large-scale data exploration
- Ensure fast and reliable access to clean data which our client-facing web application depends on
- Seek and integrate new sources of data related to our core business
- Communicate data extent and performance to internal consumers
Minimum Qualifications and Education Requirements:
- BSc/BEng degree in computer science or equivalent
- Strong relational database experience, preferably with Postgres
- The ability to communicate high level information about datasets, preferably using data visualization
- Experience writing performant data pipelines at scale, e.g. with Spark or Airflow
- The ability to use a modern language with a strong concurrency model for fast data processing such as Elixir, Rust or Go
- MSc/MEng degree in computer science or equivalent
- Passion for AI and excitement about new developments
- Contributions to open source projects
- Experience with machine learning
- Experience with data visualisation