What does Neurolinks do?

Neurolinks is the consulting brand of Matthieu Pesesse, providing four service lines: AI Automation & Agent Systems, IT Workplace & Infrastructure, Project Management & Service Delivery, and Tech Advisory. The practice is based in Brussels, Belgium and serves clients across the European Union in English, French, and Dutch.

What is Matthieu Pesesse's background?

Matthieu Pesesse has 25+ years of professional experience across telecom (Proximus), media (Clear Channel, where he raised digital network availability from 75% to 99%), public institutions (European Commission), enterprise IT (Etex Group, with 100% Smart Workplace migration), and healthcare (Anicura, sole IT support for 18 veterinary clinics). He won the European Podcast Award 2009 for Best Business Podcast and holds ITIL V3, Agile Scrum, Microsoft 365, and CrowdStrike Falcon certifications.

What does an engagement with Neurolinks cost?

Engagement pricing depends on scope. Typical entry points are an AI Discovery Workshop from EUR 2,500, a focused pilot from EUR 8,000-15,000, and ongoing retainers from EUR 4,500/month. Fixed-scope projects are quoted after a Discovery call. Travel within Belgium is included; travel outside Belgium is invoiced at cost.

Who does Neurolinks typically work with?

Neurolinks typically works with SMEs and mid-market enterprises (50-2,000 employees) operating in Belgium or across the European Union, in regulated or multilingual environments where AI adoption, IT workplace modernization, or digital transformation is a strategic priority. Engagements range from one-off advisory to multi-month delivery.

What technology stack does Neurolinks build on?

For AI Automation: OpenClaw multi-agent orchestration, OpenAI and Anthropic APIs, NVIDIA NIM for on-premises GPU inference, Docker, and Nginx. For IT Workplace: Microsoft 365, Microsoft Intune, CrowdStrike Falcon, Zscaler, and Datto. The stack is selected for production reliability rather than novelty.

How long does a typical engagement take?

An AI Discovery Workshop runs 1 to 2 weeks. A pilot or proof-of-concept typically runs 6 to 12 weeks. Workplace modernization or service delivery engagements run 3 to 12 months depending on scope. Tech Advisory retainers are open-ended monthly engagements.

In which languages does Neurolinks operate?

Neurolinks operates natively in English, French, and Dutch. This trilingual capability is uncommon among Belgian technology consultancies and matters in Belgium's three-community business landscape (Wallonia, Flanders, Brussels) and for European Union institutions.

Specialised, Frontier or Diffusion: The Procurement Matrix Enterprise Architects Are Missing

TL;DR. A 3B model specialised on Brazilian Portuguese OCR outscores Claude Opus 4.6 — 0.911 versus 0.833, per Dharma-AI — at 52 times lower cost per million pages. Nemotron-Labs Diffusion reaches 6.4× the throughput of a standard autoregressive model on B200 hardware, per NVIDIA. Three model categories. Three distinct selection criteria: domain fit, cost, and throughput.

Three years of procurement defaults — and why they are breaking

Since 2023, the dominant heuristic in enterprise AI procurement has stabilised around a single principle: the largest available model is the safest choice. The reasoning was defensible — frontier models absorbed edge cases, avoided the blind spots of premature specialisation, and externalised maintenance risk.

Two technical publications, appearing three days apart on Hugging Face, shift that frame. On 22 May 2026, Dharma-AI published a comparative benchmark on a corpus of Brazilian Portuguese legal and administrative OCR documents, pitting a 3-billion-parameter specialised model against the leading frontier models. On 23 May, NVIDIA published the Nemotron-Labs Diffusion family, introducing a block-based generation mode that reaches 6.4× the speed of a standard autoregressive baseline. Both publications share a common subtext: model size is not the only axis of enterprise competitiveness. Two others now demand measurement — distributional alignment to the deployment task, and inference throughput.

Where specialised models take the lead

On the Dharma-AI benchmark — covering printed, handwritten, and administrative documents in Brazilian Portuguese — the Dharma-OCR 3B model scores 0.911. Claude Opus 4.6 reaches 0.833, Gemini 3.1 Pro 0.820, GPT-5.4 0.750, GPT-4o 0.635, and Amazon Textract 0.618, per the Dharma-AI publication. The gap between first and second place is 7.8 percentage points.

Cost is the decisive argument at scale. Dharma-OCR 3B costs 52 times less than Claude Opus 4.6 per million pages processed, according to the same source.

Production stability is the third differentiator. On text degeneration rate — a critical metric in automated pipelines where models produce incoherent or repetitive output — Nanonets-OCR2 3B records 0.20%, against 1.41% for Qwen2.5-VL-3B in general-purpose use, per Dharma-AI. The ratio is 7 to 1. olmOCR-2 7B, another OCR specialist, reaches 0.40% — well below the general-purpose model of comparable size.

The structural logic behind these results is made explicit by Dharma-AI: specialisation compounds across levels. At 7 billion parameters, moving from a general-purpose model to a generic OCR specialist improves quality by 2.3% and halves the degeneration rate. At 3 billion parameters, the quality gain reaches 16% and the degeneration rate drops by a factor of seven, per the same publication.

Where frontier and diffusion models hold their ground

Frontier models: versatility as structural advantage

The Dharma-AI article is explicit on scope: the results cover a single, well-measured domain. On multi-domain tasks, complex reasoning over variable perimeters, or use cases whose boundaries are undefined at procurement time, frontier models retain an operational advantage that specialists cannot replicate. A model scoring 0.833 on Portuguese OCR may score 0.95 on a different domain — or be the only model capable of handling an unforeseen request type. Dharma-AI does not argue that frontier models are obsolete; the argument is that their dominance is not universal.

Nemotron-Labs Diffusion: throughput as infrastructure differentiator

The Nemotron-Labs family — 3B, 8B, 14B — introduces three distinct generation modes, per NVIDIA. Standard autoregressive mode. Block-based diffusion mode, generating 2.6× more tokens per forward pass. Self-speculation mode, which uses diffusion as a draft and autoregressive verification as a final check, reaching 6.4× baseline speed and approximately 865 tokens per second on B200 hardware, per the NVIDIA publication.

The critical technical point: this throughput gain is lossless at temperature zero. The output is identical to autoregressive mode — not an approximation. Nemotron-Labs Diffusion 8B also shows 1.2% higher average accuracy than Qwen3 8B, per the same source. On general reasoning benchmarks, frontier models retain their advantage — Nemotron-Labs Diffusion is positioned as an inference engine for latency- and throughput-constrained workloads, not as a frontier challenger.

Pricing and operational implications

Three cost and infrastructure profiles emerge, without the categories being mutually exclusive:

Specialised models: very low marginal cost per request (52× documented cost reduction on OCR, per Dharma-AI). Upfront cost: domain data annotation, fine-tuning, validation. Break-even depends on the volume of homogeneous requests and the organisation's annotation cost.
Frontier models via API: no proprietary infrastructure, no fine-tuning. Usage-based billing. High cost at scale, but maintenance and updates externalised. Relevant for low-frequency tasks or variable-scope use cases.
On-premises diffusion models: a 6.4× throughput gain frees inference slots on existing infrastructure, per NVIDIA. The critical variable is hardware compatibility — the self-speculation mode is documented on B200 — and the implementation overhead of the autoregressive verification layer.

What this means for multi-model architecture

The Hugging Face agent terminology publication, dated 25 May 2026, provides a useful operational frame: an agent is a model combined with a harness. The harness is the execution layer — model calls, tool handling, stopping conditions. The scaffold is the behavioural layer — system prompts, tool descriptions, context management. The direct implication: the same model in two different harnesses produces two distinct agent behaviours, per that publication.

This distinction becomes decisive in a multi-model architecture. If the harness is properly abstracted from the model provider, a specialised model can substitute a frontier model on a defined task without modifying the downstream pipeline. Conversely, if the harness is tightly coupled to a single vendor, every model decision carries a hidden migration cost that per-token price comparisons do not capture.

A coherent multi-model architecture rests on three layers: a specialised model on high-volume, well-defined tasks; a frontier model on exceptions and multi-domain tasks; an optimised inference engine on latency-constrained components. The harness layer is what makes this segmentation operable without a full rebuild at each vendor change.

Three levers to activate this week

Identify a high-volume sub-domain in your current pipeline. If a frontier model is processing more than 100,000 homogeneous requests per month on a definable domain — extraction, classification, OCR — calculate the current cost and the projected cost with a 3B-to-7B specialised model. The 52× gap documented by Dharma-AI is an order of magnitude for calibrating the business case.
Map your throughput bottlenecks. If your pipeline has latency or throughput constraints, test Nemotron-Labs diffusion mode on a real workload sample. The 6.4× gain published by NVIDIA is specific to self-speculation mode on B200 hardware — verify applicability to your infrastructure before any commitment.
Audit your harness portability. Before any model decision, verify that your execution layer is abstracted from the model provider. If it is not, the true cost of each model arbitrage includes a migration cost that is invisible in the pricing comparison.

Is model size still the first criterion on your evaluation grid?

If this analysis speaks to you, I publish a piece of this calibre every day on digital innovation and enterprise AI. 👉 Get the next one straight in your inbox — sign-up takes ten seconds, and each edition is read before 9 a.m. by leaders of European SMEs, mid-caps and public institutions.

Sources

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook (Hugging Face)
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models (Hugging Face)
Harness, Scaffold, and the AI Agent Terms Worth Getting Right (Hugging Face)