Silicon DevOps Engineer
Silicon DevOps Engineer, Full-Time, Bristol
Graphcore has created a completely new processor, the Intelligence Processing Unit (IPU), specifically designed for artificial intelligence. The IPU’s unique architecture means developers can run current machine learning models orders of magnitude faster. More importantly, it lets AI researchers undertake entirely new types of work, not possible using current technologies, to drive the next great breakthroughs in general machine intelligence.
We believe our IPU technology will become the worldwide standard for artificial intelligence compute. The performance of Graphcore’s IPU is going to be transformative across all industries and sectors whether you are a medical researcher, roboticist or building autonomous cars.
Our team is at the forefront of the artificial intelligence revolution, enabling innovators from all industries and sectors to expand human potential with technology. What we do, really makes a difference.
Graphcore’s chip team has created a diverse set of in-house tooling to manage both their front end and back end tool flows. As the DevOps Engineer, you will be instrumental in leading and debugging problems within this infrastructure alongside adding new functionality and improving performance within our environment.
As a DevOps Engineer embedded within the chip team at Graphcore you will be responsible for the team’s dedicated compute resource. It must be kept constantly ready to accept large HPC like workloads, dispatching and processing them in the most efficient manner. You will be able to create monitoring software both bespoke and part of Ansible or Puppet to keep yourself and the team informed of status and spot any bottle necks or misconfigurations.
Graphcore operates multiple data centre sites. You will be skilled in remote management of hardware and software running locally in not only our onsite data centre but also data centres located in other geographical regions.
You will also manage the installed software tooling, adding new packages and service packs as they become available or upon request.
You will work closely with the IT team but embedded within the chip team and servicing their requests using support request tickets.
This is a challenging, yet rewarding role that requires in depth knowledge across a diverse set of domains.
- Developing and maintaining software infrastructure for the chip team.
- Planning maintenance of compute farm hardware and software infrastructure both on and off site.
- Evaluation, specification and planning for hardware upgrade cycles.
- Working closely with IT department to ensure best possible availability of compute resources.
- Installing tools, e.g. Python and libraries, LLVM, EDA tools from Cadence/Mentor/Synopsys.
- Must have
- Be highly motivated, a self-starter, and a team player
- Good communication and negotiation skills
- Ability to work across teams and programming languages
- Experience in a software infrastructure environment
- Excellent programming skills in Python, C++, Bash
- Linux administration
- Remote hardware administration with IPMI
- Some of
- Configuration and management of
- SGE/Univa, Slurm, LSF or other DRMS
- Jenkins CI
- FlexLM licensing
- Puppet, Ansible, Nagios
- LLVM, GCC
- DVCS e.g. Git
- AWS, Azure, Google Cloud
- XML and XPath/XSLT
We welcome people of different backgrounds and experiences and are committed to building an inclusive work environment that makes Graphcore a great home for everyone. We are an equal opportunity employer and want to build a work environment where everyone is happy, productive and respectful so they can do their best work. If you have a disability or additional need that requires accommodation, just let us know.