Darren Cosker's Home Page

I am a Principal Scientist Manager (Director) at Microsoft, based in Cambridge (UK), where I currently lead work on post-training, multimodal conversational AI, evaluation systems, and meeting intelligence for Microsoft Teams. This involves training systems to proactively interact with people in real time through chat, video and voice, and to learn and adapt to user preferences using reinforcement learning-style optimization and feedback-driven improvement - as well as the practical challenges of making agentic systems useful in real products.

Before this, I was part of the Microsoft Mesh and Mixed Reality orgs, where I founded and led the Motion Lab team, working on real-time understanding and generation of human motion for remote communication in 2D and Mixed Reality. Our work contributed to shipped experiences across Teams, Microsoft Mesh, and Azure-connected services, including audio-driven avatars, face tracking, body pose estimation, and technologies for more compelling presence in immersive communication (including shipping on Quest and Windows).

Up to 2021 I was a full-time Professor of Computer Science at the University of Bath, where I joined as a Royal Academy of Engineering / EPSRC Research Fellow in 2007 and where I still hold a part-time position. Between 2012 and 2016, I also held a Royal Society Fellowship in collaboration with DNEG (previously Double Negative Visual Effects - a multiple Oscar winning London based VFX house). In 2015 I founded and became Director of the Centre for the Analysis of Motion, Entertainment Research and Applications (CAMERA), building it from an initial vision into a 50+ person centre with major facilities, industrial partnerships, a first of it's kind research-to-impact model, and over £20m in funding and partner contributions, before helping secure the £45m+ MyWorld investment in the South West of the UK.

In my career so far, my interests have spanned multiple areas. An Applied Scientist at heart, I'm interested in solving problems with real value and which have the greatest impact possible in the world outside the lab. Grounded at the intersection of AI, Computer Vision, and Computer Graphics, I focused for many years on multimodal human understanding, with applications in visual effects, human digitisation and motion capture. More recently, I have started to go deeper into reinforcement learning - interested in how self play and human feedback can directly improve models and proactive agents in multiple scenarios (particularly real time).

Multimodal AI Post-Training Real-Time Communication Digital Humans Computer Vision

Leadership and AI Delivery

A major part of my work today is not just creating new models or algorithms, but building the environment in which AI can be delivered well: defining product strategy, establishing teams, mentoring scientists and engineers, creating evaluation and experimentation workflows, and helping organizations move from prototypes to robust customer-facing systems.

In Teams, this has meant helping drive the shift toward multimodal agents, post-trained AI systems, and proactive collaboration experiences, setting technical direction for conversational AI, and building the foundations for synthetic data generation, model evaluation, and live testing using human and AI feedback.

The same leadership pattern was central to CAMERA - creating the vision, bringing together academic and industrial partners, raising substantial funding, hiring and growing an interdisciplinary team, building studios and technical infrastructure, and turning the centre into something that delivered both strong research and real-world impact. That remains one of the clearest examples of the kind of institution-building and long-horizon technical leadership I most enjoy.

It has also pulled my work further toward reinforcement learning, post-training, and the broader question of how modern AI systems become reliably useful once they leave the demo stage and enter products that people use every day. I am especially interested in real-time settings, where models must work under latency, context, and interaction constraints, and in proactive agents that can help move collaboration forward rather than only reacting to a prompt.

Previous Product Impact: From Multimodal AI to Immersive Experiences

The examples below reflect previous product work I have been fortunate to ship into the hands of millions of customers through Microsoft Teams, Microsoft Mesh, and Azure-connected services. More recently this has included multimodal conversational AI, post-training strategy, agent systems, and the tooling required to evaluate and improve these systems in live product settings.

Within Teams, this includes work around collaborative AI experiences such as Facilitator, a chat-based agent experience aimed at helping people keep meetings and collaboration on track through structured assistance, follow-up, and shared context.

At Microsoft Teams, the product impact increasingly comes from turning advances in AI into tools that improve how people communicate and coordinate. That spans meeting intelligence, multimodal interaction, and new agentic behaviours that support real-time collaboration.

Microsoft Mesh and Teams — Microsoft Mesh enables meetings in 2D through Teams and in immersive 3D spaces through VR. My work across Mesh and Teams has focused on the AI systems that support presence, communication, avatar control, meeting intelligence, and richer collaborative experiences.

CAMERA themes and studio — As Director of CAMERA, I built research themes, partnerships, teams, and studio capability around motion capture, photogrammetry, digital humans, and AI for entertainment, sport, and health applications.

At CAMERA I also created a framework for turning research into production tools with a team of engineers, deploying these in our studio, and delivering projects to clients that used the technology in practice. This helped ship a number of video games and award-winning immersive experiences. Below is a small snapshot of products and experiences I have been fortunate enough to contribute to.

11:11 Memories Retold

With Aardman and Bandai Namco (BAFTA nominated). We delivered motion capture for the video game at the CAMERA studio.

Is Anna OK?

With BBC and Aardman. An immersive experience delivered with our in-house facial rigging, animation, and motion capture solutions.

Cosmos Within Us

With Satore Studios (Cannes Lion Winner). We built digital doubles for performers and animated them using our in-house tools.

Previous Research Impact: From Digital Humans to Biomechanics and Human Perception

The research below reflects previous work spanning digital humans, motion capture of faces, bodies and animals, animation, computer vision, biomechanics, and human perception. A central theme has always been motion - understanding it, measuring it, generating it, and building systems that use it well. More recently, this work expanded into multimodal AI systems, synthetic data, evaluation, foundation-model adaptation, reinforcement learning, and proactive agents for communication and immersive experiences. A list of publications can be found on my Google Scholar page.

Facial performance capture and animation — **Facial Performance Capture and Animation**. This spans early 4D facial capture, monocular facial capture, and performance retargeting from motion capture. We built end-to-end pipelines for rigged digital facial models, automated performance transfer, and dynamic face modelling. The D3DFACS dataset later became one of the foundations of the FLAME model from Max Planck.

Generative models of shape and motion — **Generative Models of Shape and Motion**. My work has touched several areas of generative AI, from early speech-driven facial animation during my PhD through to recent personalised animation and multimodal avatar work. Today, this broader theme also connects to product-facing work on multimodal AI, post-training, and systems that combine speech, vision, and interaction signals to drive useful behaviour in real products.

Synthetic data for AI and computer vision — **Synthetic Data for AI and Computer Vision**. Computer graphics can be used to render realistic images and labels for training AI systems. We have applied these ideas across pose estimation, digital humans, and animal motion, and the same principles now extend naturally to modern multimodal systems where data generation, evaluation, and feedback loops are central to product quality.

Markerless motion capture and analysis for biomechanics — **Markerless Motion Capture and Analysis for Biomechanics**. One of the core motivations of CAMERA was to apply state-of-the-art computer vision and AI to biomechanics and elite sport. We worked closely with coaches and athletes, including the British Skeleton team, to create markerless systems that could provide useful biomechanical signals in real training environments.

Human pose estimation from egocentric cameras and head-mounted displays — **Human Pose Estimation from Egocentric Cameras and HMDs**. At Microsoft, we have explored 3D human pose estimation from head-mounted devices and egocentric viewpoints, including work that supports avatar embodiment and full-body understanding in immersive products. These technologies are central to convincing presence experiences in mixed reality.

Research Funding and Awards

As a Professor, one of the core activities in building ambitious research programmes is funding. Below is a selection of the major awards that supported large-scale centres such as CAMERA as well as more targeted projects in AI, computer vision, digital humans, performance capture, and perception.

2021-2026: MyWorld (~£45m FEC). UKRI (PI, University of Bath)

2020-2025: CAMERA 2.0 - Centre for the Analysis of Motion, Entertainment Research and Applications (£4,151,614 FEC). EPSRC

2019-2021: CAMERA Motion Capture Innovation Studio (£901,391). Horizon 2020

2019-2022: A Tool to Reveal Individual Differences in Facial Perception (£402,113). Medical Research Council

2018-2020: Rheumatoid Arthritis Flare Profiler (£165,126; total project value £663,290). Innovate UK

2018-2022: Bristol and Bath Creative Cluster (~£4m). AHRC

2017-2019: DOVE: Deformable Objects for Virtual Environments (£128,746; total project value £562,559 FEC). Innovate UK

2016-2018: HARPC: HMC for Augmented Reality Performance Capture (£119,025; total project value £517,616 FEC). Innovate UK

2015-2020: CAMERA (£4,998,728 FEC; ~£5m additional partner contributions). EPSRC/AHRC

2012-2016: Next Generation Facial Capture and Animation (£100,887 FEC). Royal Society Industry Fellowship

2007-2012: Exploiting 4D Data for Creating Next Generation Facial Modelling and Animation Techniques (£460,640 FEC). Royal Academy of Engineering Research Fellowship

Public Data

RGBD-Dog

RGBD-Dog contains motion capture and multiview RGB and RGBD data for dogs performing different actions. You can get the data, code to view it, and the CVPR 2020 paper from our GitHub page.

D3DFACS

D3DFACS contains over 500 FACS-coded dynamic 3D sequences from 10 individuals, including meshes, stereo UV maps, colour images, and calibration files. More information is available in our ICCV 2011 paper A FACS Valid 3D Dynamic Action Unit Database with Applications to 3D Dynamic Morphable Facial Modelling and on the dataset website.

Shadow Removal

Shadow Removal Ground Truth and Evaluation provides a benchmark and dataset for single-image shadow removal, enabling open quantitative comparison across a challenging range of cases. The evaluation website is available here.

Alumni (University)

At Microsoft I lead teams of talented scientists and engineers. As a Professor and former Director of CAMERA, I have also had the privilege of working with outstanding students, researchers, technical staff, and collaborators.

Martin Parsons (CAMERA), Murray Evans (CAMERA), Yiguo Qiao (Living With/RUH/InnovateUK), Jack Saunders, George Fletcher, Jake Deane, Kyle Reed (Cubic Motion), Jose Serra (Digital Domain/ILM), Anamaria Ciucanu (MMU), Pedro Mendes, Shridhar Ravikumar (Amazon, Apple), Alastair Barber (The Foundry), Wenbin Li (Bath), Han Gong (Apple), Charalampos Koniaris (Disney Research), Daniel Beale, Sinan Mutlu (Framestore), Nicholas Swafford, Nadejda Roubtsova (CAMERA), Sinead Kearney (CAMERA), Maryam Naghizadeh, Catherine Taylor (Marshmallow Laser Feast).

Personal

I love my work, but the number one thing in my life is my family. If anything, having a family motivates me even more in my work, giving me the desire to make sure we all have the best life. It also forces you to be efficient and productive with the time that you are working, and to appreciate the time you have together even more.