Overall purpose:
• To provide expert input to the day-to-day maintenance of the Institute’s Scientific Computing facilities, diagnosing and troubleshooting problems that arise.
• To plan, install, commission and maintain compute and data storage resources supporting the research of the Institute, including Machine Learning.
• Assume technical and organisational lead responsibility for one or more computing resources.
• To maintain Scientific Computing services such as DNS, DHCP, Linux workstations, storage servers, backup systems and the computing cluster.
• To help optimise the connection of servers and end user PCs / Apple Macs to Unix equipment, eg remote desktops and file shares.
• To advise and assist users in the use of deep learning to analyse large datasets, including training and inference with computer vision or LLMs.
• To provide other general technical assistance as required, for example in backups and the maintenance of large file systems.
• To provide timely and pertinent communication to colleagues and users (as appropriate) by web, email or other documents.
• Contribute to writing of disaster recovery processes and documentation, as directed.
• Attendance and reporting to appropriate internal and external meetings.
• To liaise with management and scientists and advise where possible on IT solutions.
• This technical work will support the scientific mission of the Laboratory of Molecular Biology to understand life from molecules to organisms. The postholder will have a particular focus on neurobiology, including petabyte scale brain wiring data.
Main duties:
• To work with the Scientific Computing team, including taking responsibility as required for operational issues such as maintaining and troubleshooting the Institute’s data storage and computing cluster during the absence of other team members, and attending meetings on behalf of the Scientific Computing group.
• To initiate and act as an independent project manager for discrete infrastructure services/projects, typically integrating diverse software components to provide a functioning whole, and reporting to the Head of Scientific Computing on significant milestones.
• In consultation with scientific and support staff and the Head of Scientific Computing, to propose, develop, implement, test, upgrade and maintain hardware and software solutions, especially those with application of deep learning to large datasets.
• To research, develop and maintain the Institute’s High Performance Computer (HPC) cluster and multi-petabyte data storage systems which form the core of the Scientific computing infrastructure. This is a crucial and rapidly changing field which requires a high level of computing expertise and a knowledge of the scientific requirements of the institute’s scientists.
• Creating and maintaining documentation describing the systems configuration and operation.
• To give technical support and assistance to users, whether that be troubleshooting technical problems or helping with personal user operating difficulties.
• Implementing hardware / software installations and upgrades.
Key Responsibilities
• To be responsible for the day-to-day provision and maintenance of Scientific Computing services across the institute, ensuring that systems run in a smooth manner and as transparently as possible for users, addressing and resolving all types of hardware and software problems efficiently within an appropriate time frame and providing support to users for both systems and applications when appropriate. This will include monitoring, maintaining, and repairing the central UNIX facilities, such as servers, application devices, switches and maintenance of the Institute’s UNIX desktop and laptop machines (currently ~80 machines), together with the cluster and associated servers (currently ~600 machines), and also assist with the maintenance and integrity of the Institute’s data storage, backup, disaster recovery and archive systems.
• To contribute to the Scientific Computing team effort in providing a state-of-the-art computing infrastructure that facilitates the cost-effective accomplishment of the institute’s scientific goals with a focus on neurobiology.
• To evaluate, test and maintain software libraries, especially those used in image processing and machine learning such as PyTORCH and Tensorflow as well as OpenCV and Jax.
• To be responsible for the maintenance of the current Scientific Computing facilities audit, ensuring licences are kept up to date.
• To have responsibility for monitoring and maintaining the IT hardware in good health, pro-actively replacing disk drives and other failing components, as well as maintaining good security by applying software updates and upgrades as required.
• To document all changes made, and fully describe new systems developed on the groups wiki.
• To share information with the group regarding technical developments.
• To propose, develop, implement, test, upgrade and maintain hardware.
Working relationships:
The post holder will report to the Head of Scientific Computing, and in their absence to the Senior Scientific Computing Officer. They will be required to produce oral and written reports for other members of management. The post holder will also be expected to give presentations to the Unix users of current and planned work as well as day to day informal contact and regular meetings. The post is funded by the Neurobiology Division and the postholder, Head of Scientific Computing and Joint Head of Neurobiology will meet regularly ( quarterly) to set high level priorities that support work processing petabyte scale brain image datasets.
In addition, the post holder will need to communicate with users to ascertain any problems they might be having and determine solutions, such as user training. The post holder will interact with all members of the institute and will form part of the Scientific Computing Group.
/*generated inline style */ Person Specification
Education / qualifications / training required:
Essential: Degree in Computing, Science or equivalent subject.
Desirable: MSc/MA/PhD RHC, or other formal computing qualifications.
Previous work experience required:
Essential: Experience of server administration and/or system development.
Desirable: Experience of integrating Linux servers into a heterogeneous client environment (Linux / Windows / Mac) using tools such as NFS and Samba. Administration experience with authentication systems such as LDAP, Kerberos and Active Directory. Experience of training large language machine learning models with large datasets especially image volumes. Experience in managing imagery for large-scale connectomics.
Knowledge and experience:
Essential: Experience of supporting Linux and Unix systems.
Desirable: Experience of supporting & installing Red Hat / Centos / Debian Linux, Windows 10 and OSX. Experience of optimising training/inference of deep learning models. Experience with databases PostgreSQL, data visualisation and image alignment algorithms. Familiarity with the use of containers apptainer / singularity.
Computer Cluster Administration
Essential: Experience of using HPC clusters or cluster-based storage systems.
Desirable: Experience of building and administering clustered storage system ( Ceph, Beegfs, ZFS or Lustre). Experience of administering and installing large scale Linux based HPC, SGE, SLURM or LFS.
Programming and Automation Skills
Essential: Competent in writing Shell Scripts.
Desirable: Experience of C, Python or Fortran programming. Experience of installing and using PyTORCH, Tensorflow, openCV & Jax
Understanding the work of the Laboratory
Essential: Experience working in an IT support, academic or research environment.
Desirable: Research experience in Biology.
People Skills, communication skills (written and verbal)
Essential: Excellent communication skills – both verbal and written. Ability to communicate difficult concepts successfully at all levels.
Experience/ability to devise and implement policies and procedures.
Desirable: Writing policy and procedure and system documentation.
Good organisational skills. Experience of managing work projects.
Issue Management
Essential: Can work under pressure, ability to prioritise and multi-task.
Desirable: Good organisational skills. Experience of managing work projects.
Self-Development Essential
Essential: Track record in keeping self-updated and informed about the latest developments in IT.
Willingness to learn and develop own skills in areas useful to the unit.
Willingness to learn new skills not studied before.
Personal skills / behaviours / qualities:
Essential: Experience/ability to devise and implement policies and procedures.
Desirable: Writing policy and procedure and system documentation.
Good organisational skills.
Experience of managing work projects.
The successful candidate is expected to be highly motivated, creative and capable of contributing productively in a team environment.
A well-organised individual willing to undertake continuing personal development.
Comfortable with continuous improvement and able to adapt to the changing priorities and demands of a dynamic research environment.
Articulate and able to communicate at technical and non-technical levels.
A pragmatic and diplomatic approach to problem solving.
/*generated inline style */