HPC System Performance Engineer
About the role
Within ARM's Technical Services Group (TSG), we are a small but crucial team dedicated to the architecture of our internal engineering compute platforms. Working closely with complementary functions within the group - service operations, solution design and operational analytics - we provide a strategic view on evolution of the underlying compute platform to meet both existing and assumed future needs of the associated processes. Someone who can provide both critical performance based evidence, as well as an expert view, on how our engineering platform is performing and where there are opportunities for improvement is ideal for this role.
The engineering platform within ARM can be broadly categorized into:
- High performance large scale compute servers used en masse by engineering processes, run as a set of clusters, controlled by scheduling tools such as IBM's LSF or build farms such as Jenkins
- High availability application servers used to run supporting processes such as databases, ALM tools, etc.
- Large scale network attached storage infrastructure used to service the above
What will I be accountable for?
We are looking for someone with a background in performance tuning to complement the existing skills of the group, and you will play a lead role in ensuring that our platform configuration aids engineering flow efficiency within ARM. We are convinced there are significant gains to be had from our flows through the process of identifying and executing on marginal gains, and you will help drive the activity in this space.
You will work closely with the service operations teams who are responsible for the current state, and the domain architects who provide the expertise in the underlying technical components. You will also need to engage actively with our tool suppliers, ensuring they are delivering tools that function well in our environment.
We pride ourselves on a measured approach to change, favoring data driven decisions, delivering improvements that demonstrably improve efficiency, effectiveness and fit-for-purpose of the platform without introducing undue risk to the business.
What skills, experience and qualifications do I need?
Required Skills & Experience
- Considers themselves a performance engineer
- Has a solid background of experience in Linux/Unix environments
- In-depth knowledge of high throughput HPC systems, cluster management, high performance NAS solutions, provisioning tools, and job schedulers
- Competent at analyzing and identifying performance bottlenecks across a large and busy compute estate
- Familiar with types of tracing tools available to monitor black-box executables running in production Linux cluster environments
- Comfortable tuning kernel parameters
- Comfortable talking to vendors about their software tools or hardware appliances, and how best to tune the usage of them
- Rigorous testing and documentation driven approach to the problem
Desirable Skills & Experience
- Exposure to large scale monitoring or metrics systems based on tools such as Prometheus, Ganglia, StatsD, etc.
- Aware of the challenges involved in performing distributed tracing of complex workflows
- Experience with EDA tools from Cadence, Mentor or Synopsys, and exposure to the types of engineering workflow that leverages them
- Experience with IBM Spectrum family of products, especially LSF
- Experience with Isilon and/or Netapp
What are the desired behaviors for this role?
We are proud to have a set of behaviors that differentiate our talent in the marketplace. These are embedded in all our roles and applicants are encouraged to evidence their attitudes/behaviors as part of the application process:
- Operates effectively and openly in teams and shares both knowledge and success with others
- Builds strong and lasting relationships based on mutual trust
- Actively seeks out and encourages alternative viewpoints and ideas
- Applies critical thinking to select the best way forward
- Demonstrates a positive attitude in gaining insight from team experiences and is receptive to feedback
- Is passionate about the success of others and actively provides support for their development
- Listens and explores alternative perspectives before carefully shaping work that will deliver impactful results
- Persuades rather than pushes when influencing colleagues
- Acts with a thoughtful sense of urgency
- Demonstrates a helpful, can-do attitude
- Thinks and acts in the best interests of our customers and partners
- Strives to achieve win-win outcomes for ARM and our customers
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.