/posts/detect-objects-with-php

Object Detection in PHP with ONNX Runtime

TL;DR

ONNX Runtime has a PHP binding. Feed it raw pixel arrays, get bounding boxes back. The whole pipeline -- image decode, inference, box drawing -- runs in one PHP process.

6 min updated

PHP can run object detection. You load a .onnx model file with new OnnxRuntime\Model(), feed it a pixel array extracted from a JPEG via GD, and get bounding boxes back. No Python process. No HTTP call to a sidecar. No queue. The ankane/onnxruntime Composer package wraps the same C++ runtime that Python uses -- same inference engine, same model format, one PHP process. The pipeline is three steps: decode the JPEG into a pixel array, run predict(), draw boxes back onto the image. The one gap is label names: ONNX models return integer class IDs, so you supply a lookup array for the COCO labels you care about.


ONNX Stores the Computation Graph in a Language-Neutral File

ONNX stores a trained model's computation graph -- weights, activation functions, tensor shapes -- in a single portable file. Any language with an ONNX Runtime binding can execute that graph. PHP has one: the ankane/onnxruntime package links against the ONNX Runtime C++ library and exposes a two-method API. Same inference engine, same model, no Python involved.

The constraint is format. The model must be in .onnx format, and its inputs must be constructible from native PHP types -- arrays of floats and ints. For standard image classification and object detection models, this works cleanly. The C++ runtime handles the matrix math. PHP allocates the input array and reads the output.

PHP tutorials that add ML inference often reach for a Python HTTP endpoint because Python historically had the only mature bindings. That gap closed when ONNX Runtime published official bindings for Go, Java, C#, and JavaScript -- and when the Composer package appeared for PHP.

A JPEG Goes In; a JPEG with Red Rectangles Comes Out

Feed the script a photo. It produces the same photo with red rectangles drawn over every detected object above a confidence threshold, each labeled with a class name. To produce that output, three stages run in sequence, each with a fixed input shape and a fixed output shape.

Stage 1: Image to pixel array

GD decodes the JPEG. imagecreatefromjpeg() returns a GD resource. imagecolorat() and imagecolorsforindex() extract RGB values per pixel. The output is a nested PHP array: $pixels[y][x] = [r, g, b], each channel an integer 0-255. The model expects exactly this layout -- height first, width second, channels innermost.

function getPixels($img): array
{
    $pixels = [];
    $width  = imagesx($img);
    $height = imagesy($img);
    for ($y = 0; $y < $height; $y++) {
        $row = [];
        for ($x = 0; $x < $width; $x++) {
            $rgb   = imagecolorat($img, $x, $y);
            $color = imagecolorsforindex($img, $rgb);
            $row[] = [$color['red'], $color['green'], $color['blue']];
        }
        $pixels[] = $row;
    }
    return $pixels;
}

Stage 2: Inference

OnnxRuntime\Model loads the .onnx file once. predict() takes a named input array -- the key 'inputs' matches the input tensor name in the model -- and returns an associative array of named outputs. For a standard COCO detector:

  • num_detections -- count of objects found per image
  • detection_classes -- integer class ID per detection
  • detection_boxes -- normalized coordinates per detection: [top, left, bottom, right] as floats in [0, 1]
  • detection_scores -- confidence score per detection

Stage 3: Draw boxes

Coordinates are normalized: multiply by image width or height to get pixel coordinates. GD's imagerectangle() draws the box. imagettftext() places the label. The result writes back to JPEG with imagejpeg().

function drawBox(&$img, string $label, array $box): void
{
    $width  = imagesx($img);
    $height = imagesy($img);

    $top    = (int) round($box[0] * $height);
    $left   = (int) round($box[1] * $width);
    $bottom = (int) round($box[2] * $height);
    $right  = (int) round($box[3] * $width);

    $red = imagecolorallocate($img, 255, 0, 0);
    imagerectangle($img, $left, $top, $right, $bottom, $red);

    $font = '/System/Library/Fonts/HelveticaNeue.ttc';
    imagettftext($img, 16, 0, $left, $top - 5, $red, $font, $label);
}

Note: The font path is macOS-specific. On Linux, substitute a TTF path from /usr/share/fonts/.

PHP Allocates Arrays; C++ Does the Matrix Math

PHP does not participate in inference computation. The C++ runtime handles matrix multiplication, activation functions, and tensor operations at native speed. A single inference call on a modern CPU takes 50-200ms for a MobileNet-SSD or EfficientDet model. That range is acceptable for batch jobs: scanning uploaded images, processing a photo library, running a nightly pipeline.

The microservice alternative -- PHP calls a Python FastAPI endpoint -- adds round-trip latency plus JSON serialization overhead for a full image payload. For low-volume use the difference is small. For a pipeline running thousands of images per night, eliminating the HTTP boundary and the serialization step reduces wall time.

Four Operations, Fixed Order

  1. Load the model -- new OnnxRuntime\Model('model.onnx'). Reads the .onnx file, allocates session memory, compiles the execution plan. Do this once per process, not per image.

  2. Build the input tensor -- construct ['inputs' => [$pixels]]. The outer array is the batch dimension -- one image equals a one-element batch. Input key names must match the model's named inputs exactly. Use Netron or onnxruntime-tools to inspect a model's input names if the key does not match.

  3. Run predict -- $model->predict($input). Synchronous call into the C++ runtime. Returns an associative array of output tensors. Throws on shape mismatch or invalid input.

  4. Decode results -- iterate num_detections, index into detection_classes, detection_boxes, detection_scores. Filter by score threshold as needed; the model returns all detections, including low-confidence ones.

Six Constraints Fixed by Model Format and Runtime

  • Input shape is fixed by the model. The .onnx file encodes the expected tensor dimensions. If the model was trained on 300x300 images and you pass a 1920x1080 pixel array, the runtime throws a shape mismatch exception at inference time -- not a warning. Resize the image before extracting pixels.

  • Class IDs are model-specific. COCO has 80 annotated categories with non-contiguous IDs -- class 12 is not necessarily "cat." The label mapping (23 => 'bear', 88 => 'teddy bear') comes from the COCO label map, not the model file. A different training dataset means a different mapping.

  • No GPU by default. The ankane/onnxruntime package links against the CPU-only ONNX Runtime build. GPU execution requires building against the CUDA or CoreML execution provider and configuring session options explicitly. On CPU, inference time scales linearly with model size.

  • Batch size of one. The PHP binding wraps $pixels in an outer array -- [$pixels] -- to create a batch of one. The binding's behavior with batch sizes greater than one is not documented. Test before relying on it.

  • Memory scales with image resolution. A 4000x3000 image produces a $pixels array with 12 million entries, each a three-element int array. PHP's per-entry overhead is significant -- expect 50-100MB for large images. Resize to the model's native resolution before inference.

  • imagettftext() requires FreeType. GD must be compiled with FreeType support for text rendering. phpinfo() shows FreeType Support => enabled if available. Without it, skip label text or fall back to imagestring() with a built-in bitmap font.

PHP, Python, and TensorFlow Serving Compared

AxisPHP + php-onnxruntimePython + onnxruntimeTensorFlow Serving
Inference speedsame (same C++ runtime)samesame
Setup costComposer install + extensionpip installDocker, gRPC client, model server
Model compatibility.onnx only.onnx, .pb, SavedModelSavedModel, TF-native only
Batch queuingmanualmanualbuilt-in
GPU supportmanual build requiredpip install onnxruntime-gpunative
Available toolingthinfull (Hugging Face, transformers)full
When it winsexisting PHP codebase, no Python infraGPU inference, model experimentationhigh-throughput production APIs

If the requirement is GPU inference, dynamic batching, or model hot-reload, TensorFlow Serving or Triton Inference Server is the right tool. If the requirement is adding object detection to an existing PHP application without standing up a new service, the Composer package removes the boundary entirely.

Three COCO Models Available in ONNX Format

The COCO dataset -- Common Objects in Context -- covers 80 annotated object categories across 330,000 images. Three models trained on it are widely available in .onnx format and load cleanly with the PHP binding:

  • MobileNet-SSD v2 -- 28MB, ~80ms on CPU, good for batch contexts and embedded use
  • EfficientDet-D0 -- 15MB, similar speed, higher accuracy on small objects
  • YOLOv8n (exported to ONNX) -- 6MB, ~30ms on CPU, current default for edge inference

All three export from their Python training libraries with one call:

# PyTorch / ultralytics
model.export(format='onnx')

The resulting .onnx file loads in PHP without modification. Input shape, output tensor names, and coordinate normalization conventions vary by architecture -- check the model card before mapping class IDs.

The PHP process loads the file. The C++ runtime does the math. The JPEG comes out with boxes drawn on it.