Senior ai research engineer, model inference (remote)

London

Tether Operations Limited

Research engineer

Posted: 23h ago

Offer description

Overview

Join Tether and Shape the Future of Digital Finance

At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains. By harnessing the power of blockchain technology, Tether enables you to store, send, and receive digital tokens instantly, securely, and globally, all at a fraction of the cost. Transparency is the bedrock of everything we do, ensuring trust in every transaction.

Innovate with Tether

Tether Finance: Our innovative product suite features the world’s most trusted stablecoin, USDT, relied upon by hundreds of millions worldwide, alongside pioneering digital asset tokenization services.

But that’s just the beginning:

Tether Power: Driving sustainable growth, our energy solutions optimize excess power for Bitcoin mining using eco-friendly practices in state-of-the-art, geo-diverse facilities.

Tether Data: Fueling breakthroughs in AI and peer-to-peer technology, we reduce infrastructure costs and enhance global communications with cutting-edge solutions like KEET, our flagship app that redefines secure and private data sharing.

Tether Education: Democratizing access to top-tier digital learning, we empower individuals to thrive in the digital and gig economies, driving global growth and opportunity.

Tether Evolution: At the intersection of technology and human potential, we are pushing the boundaries of what is possible, crafting a future where innovation and human capabilities merge in powerful, unprecedented ways.

Responsibilities

* Implement and optimize custom inference and fine-tuning kernels for small and large language models across multiple hardware backends.
* Implement and optimize full and LoRA fine-tuning for small and large language models across multiple hardware backends.
* Design and extend datatype and precision support (int, float, mixed precision, ternary QTypes, etc.).
* Design, customize, and optimize Vulkan compute shaders for quantized operators and fine-tuning workflows.
* Investigate and resolve GPU acceleration issues on Vulkan and integrated/mobile GPUs.
* Architect and prepare support for advanced quantization techniques to improve efficiency and memory usage.
* Debug and optimize GPU operators (e.g., int8, fp16, fp4, ternary).
* Integrate and validate quantization workflows for training and inference.
* Conduct evaluation and benchmarking (e.g., perplexity testing, fine-tuned adapter performance).
* Conduct GPU testing across desktop and mobile devices.
* Collaborate with research and engineering teams to prototype, benchmark, and scale new model optimization methods.
* Deliver production-grade, efficient language model deployment for mobile and edge use cases.
* Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines designed for edge and on-device applications. Define clear success metrics such as improved real-world performance, low error rates, robust scalability, optimal memory usage and ensure continuous monitoring and iterative refinements for sustained improvements.

Qualifications

* Proficiency in C++ and GPU kernel programming.
* Proven Expertise in GPU acceleration with Vulkan framework.
* Strong background in quantization and mixed-precision model optimization.
* Experience and Expertise in Vulkan compute shader development and customization.
* Familiarity with LoRA fine-tuning and parameter-efficient training methods.
* Ability to debug GPU-specific performance and stability issues on desktop and mobile devices.
* Hands-on experience with mobile GPU acceleration and model inference.
* Familiarity with large language model architectures (e.g., Qwen, Gemma, LLaMA, Falcon etc.).
* Experience implementing custom backward operators for fine-tuning.
* Experience creating and curating custom datasets for style transfer and domain-specific fine-tuning.
* Demonstrated ability to apply empirical research to overcome challenges in model development and deployment.

Important information for candidates

* Apply only through our official channels. We do not use third-party platforms or agencies for recruitment unless clearly stated. All open roles are listed on our official careers page: https://tether.recruitee.com/
* Verify the recruiter’s identity. All our recruiters have verified LinkedIn profiles. If you’re unsure, you can confirm their identity by checking their profile or contacting us through our website.
* Be cautious of unusual communication methods. We do not conduct interviews over WhatsApp, Telegram, or SMS. All communication is done through official company emails and platforms.
* Double-check email addresses. All communication from us will come from emails ending in tether.to or tether.io.
* We will never request payment or financial details. If someone asks for personal financial information or payment during the hiring process, it is a scam. Please report it immediately.
* When in doubt, feel free to reach out through our official website.
#J-18808-Ljbffr

Apply

Create E-mail Alert

Save

Similar job

Senior research engineer, deep learning for cancer genomics

London

InstaDeep

Research engineer

Similar job

Research engineer - ml

London

Harnham Search & Selection

Research engineer

Similar job

Ml research engineer, london

London

Isomorphic Labs

Research engineer