Since cameras have improved, real-time object detection has become an increasingly sought-after functionality. From self-driving cars and smart surveillance systems to augmented reality applications, this technology is used in many situations.

Computer vision, a fancy term for the technology that uses cameras with computers to carry out operations like those mentioned above, is a vast and complicated field. However, you may not know that you can get started with real-time object detection very easily from the comfort of your browser.

This article explains how to build a real-time object detection app using React and deploy the app to Kinsta. The real-time object detection app leverages the user’s webcam feed.

Prerequisites

Here’s a breakdown of the key technologies used in this guide:

  • React: React is used to construct the application’s user interface (UI). React excels at rendering dynamic content and will be useful in presenting the webcam feed and detected objects within the browser.
  • TensorFlow.js: TensorFlow.js is a JavaScript library that brings the power of machine learning to the browser. It allows you to load pre-trained models for object detection and run them directly within the browser, eliminating the need for complex server-side processing.
  • Coco SSD: The application uses a pre-trained object detection model called Coco SSD, a lightweight model capable of recognizing a vast array of everyday objects in real time. While Coco SSD is a powerful tool, it’s important to note that it’s trained on a general dataset of objects. If you have specific detection needs, you can train a custom model using TensorFlow.js by following this guide.

Set up a new React project

  1. Create a new React project. Do this by running the following command:
    npm create vite@latest kinsta-object-detection --template react

    This will scaffold a baseline React project for you using vite.

  2. Next, install the TensorFlow and Coco SSD libraries by running the following commands in the project:
    npm i @tensorflow-models/coco-ssd @tensorflow/tfjs

Now, you are ready to start developing your app.

Configuring the app

Before writing the code for the object detection logic, let’s understand what is developed in this guide. Here’s what the UI of the app would look like:

A screenshot of the completed app with the header and a button to enable webcam access.
UI design of the app.

When a user clicks the Start Webcam button, they are prompted to grant the app permission to access the webcam feed. When permission is granted, the app starts showing the webcam feed and detects objects present in the feed. It then renders a box to show the detected objects on the live feed and adds a label to it as well.

To start, create the UI for the app by pasting the following code in the App.jsx file:

import ObjectDetection from './ObjectDetection';
function App() {
  return (
    <div className="app">
      <h1>Image Object Detection</h1>
        <ObjectDetection />
    </div>
  );
}

export default App;

This code snippet specifies a header for the page and imports a custom component named ObjectDetection. This component contains the logic for capturing the webcam feed and detecting objects in real time.

To create this component, create a new file named ObjectDetection.jsx in your src directory and paste the following code in it:

import { useEffect, useRef, useState } from 'react';

const ObjectDetection = () => {
  const videoRef = useRef(null);
  const [isWebcamStarted, setIsWebcamStarted] = useState(false)

  const startWebcam = async () => {
    // TODO
  };

  const stopWebcam = () => {
     // TODO
  };

  return (
    <div className="object-detection">
      <div className="buttons">
        <button onClick={isWebcamStarted ? stopWebcam : startWebcam}>{isWebcamStarted ? "Stop" : "Start"} Webcam</button>
      </div>
      <div className="feed">
        {isWebcamStarted ? <video ref={videoRef} autoPlay muted /> : <div />}
      </div>
    </div>
  );
};

export default ObjectDetection;

The code above defines an HTML structure with a button to start and stop the webcam feed and a <video> element that will be used to show the user their webcam feed once it is active. A state container isWebcamStarted is used to store the state of the webcam feed. Two functions, startWebcam and stopWebcam are used to start and stop the webcam feed. Let’s define them:

Here’s the code for the startWebcam function:

const startWebcam = async () => {
    try {
      setIsWebcamStarted(true)
      const stream = await navigator.mediaDevices.getUserMedia({ video: true });

      if (videoRef.current) {
        videoRef.current.srcObject = stream;
      }
    } catch (error) {
      setIsWebcamStarted(false)
      console.error('Error accessing webcam:', error);
    }
  };

This function takes care of requesting the user to grant webcam access, and once the permission is granted, it sets the <video> to show the live webcam feed to the user.

If the code fails to access the webcam feed (possibly due to a lack of webcam on the current device or the user is denied permission), the function will print a message to the console. You can use an error block to display the reason for the failure to the user.

Next, replace the stopWebcam function with the following code:

const stopWebcam = () => {
    const video = videoRef.current;

    if (video) {
      const stream = video.srcObject;
      const tracks = stream.getTracks();

      tracks.forEach((track) => {
        track.stop();
      });

      video.srcObject = null;
      setPredictions([])
      setIsWebcamStarted(false)
    }
  };

This code checks for the running video stream tracks being accessed by the <video> object and stops each of them. Finally, it sets the isWebcamStarted state to false.

At this point, try running the app to check if you can access and view the webcam feed.

Make sure to paste the following code in the index.css file to make sure the app looks the same as the preview you saw earlier:

#root {
  font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif;
  line-height: 1.5;
  font-weight: 400;
  color-scheme: light dark;
  color: rgba(255, 255, 255, 0.87);
  background-color: #242424;
  min-width: 100vw;
  min-height: 100vh;
  font-synthesis: none;
  text-rendering: optimizeLegibility;
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

a {
  font-weight: 500;
  color: #646cff;
  text-decoration: inherit;
}

a:hover {
  color: #535bf2;
}

body {
  margin: 0;
  display: flex;
  place-items: center;
  min-width: 100vw;
  min-height: 100vh;
}

h1 {
  font-size: 3.2em;
  line-height: 1.1;
}

button {
  border-radius: 8px;
  border: 1px solid transparent;
  padding: 0.6em 1.2em;
  font-size: 1em;
  font-weight: 500;
  font-family: inherit;
  background-color: #1a1a1a;
  cursor: pointer;
  transition: border-color 0.25s;
}

button:hover {
  border-color: #646cff;
}

button:focus,

button:focus-visible {
  outline: 4px auto -webkit-focus-ring-color;
}

@media (prefers-color-scheme: light) {
  :root {
    color: #213547;
    background-color: #ffffff;
  }

  a:hover {
    color: #747bff;
  }

  button {
    background-color: #f9f9f9;
  }
}

.app {
  width: 100%;
  display: flex;
  justify-content: center;
  align-items: center;
  flex-direction: column;
}

.object-detection {
  width: 100%;
  display: flex;
  flex-direction: column;
  align-items: center;
  justify-content: center;

  .buttons {
    width: 100%;
    display: flex;
    justify-content: center;
    align-items: center;
    flex-direction: row;

    button {
      margin: 2px;
    }
  }

  div {
    margin: 4px;
  }
}

Also, remove the App.css file to avoid messing up the styles of the components. Now, you are ready to write the logic for integrating real-time object detection in your app.

Set up real-time object detection

  1. Start by adding the imports for Tensorflow and Coco SSD at the top of ObjectDetection.jsx:
    import * as cocoSsd from '@tensorflow-models/coco-ssd';
    
    import '@tensorflow/tfjs';
  2. Next, create a state in the ObjectDetection component to store the array of predictions generated by the Coco SSD model:
    const [predictions, setPredictions] = useState([]);
  3. Next, create a function that loads the Coco SSD model, collects the video feed, and generates the predictions:
    const predictObject = async () => {
        const model = await cocoSsd.load();
    
        model.detect(videoRef.current).then((predictions) => {
          setPredictions(predictions);
        })
    
          .catch(err => {
            console.error(err)
          });
      };

    This function uses the video feed and generates predictions for objects present in the feed. It will provide you with an array of predicted objects, each containing a label, a confidence percentage, and a set of coordinates showing the object’s location in the video frame.

    You need to continuously call this function to process video frames as they come and then use the predictions stored in the predictions state to show boxes and labels for each identified object on the live video feed.

  4. Next, use the setInterval function to call the function continuously. You must also stop this function from being called after the user has stopped the webcam feed. To do that, use the clearInterval function from JavaScript.Add the following state container and the useEffect hook in the ObjectDetection component to set up the predictObject function to be called continuously when the webcam is enabled and removed when the webcam is disabled:
    const [detectionInterval, setDetectionInterval] = useState()
    
      useEffect(() => {
        if (isWebcamStarted) {
          setDetectionInterval(setInterval(predictObject, 500))
        } else {
          if (detectionInterval) {
            clearInterval(detectionInterval)
            setDetectionInterval(null)
          }
        }
      }, [isWebcamStarted])

    This sets up the app to detect the objects present in front of the webcam every 500 milliseconds. You can consider changing this value depending on how fast you want the object detection to be while keeping in mind that doing it too often might result in your app using a lot of memory in the browser.

  5. Now that you have the prediction data in the prediction state container, you can use it to display a label and a box around the object in the live video feed. To do that, update the return statement of the ObjectDetection to return the following:
    return (
        <div className="object-detection">
          <div className="buttons">
            <button onClick={isWebcamStarted ? stopWebcam : startWebcam}>{isWebcamStarted ? "Stop" : "Start"} Webcam</button>
          </div>
          <div className="feed">
            {isWebcamStarted ? <video ref={videoRef} autoPlay muted /> : <div />}
            {/* Add the tags below to show a label using the p element and a box using the div element */}
            {predictions.length > 0 && (
              predictions.map(prediction => {
                return <>
                  <p style={{
                    left: `${prediction.bbox[0]}px`, 
                    top: `${prediction.bbox[1]}px`,
                    width: `${prediction.bbox[2] - 100}px`
                }}>{prediction.class  + ' - with ' 
                + Math.round(parseFloat(prediction.score) * 100) 
                + '% confidence.'}</p>
                <div className={"marker"} style={{
                  left: `${prediction.bbox[0]}px`,
                  top: `${prediction.bbox[1]}px`,
                  width: `${prediction.bbox[2]}px`,
                  height: `${prediction.bbox[3]}px`
                }} />
                </>
              })
            )}
          </div>
          {/* Add the tags below to show a list of predictions to user */}
          {predictions.length > 0 && (
            <div>
              <h3>Predictions:</h3>
              <ul>
                {predictions.map((prediction, index) => (
                  <li key={index}>
                    {`${prediction.class} (${(prediction.score * 100).toFixed(2)}%)`}
                  </li>
                ))}
              </ul>
            </div>
          )}
    
        </div>
      );

    This will render a list of predictions right below the webcam feed and draw a box around the predicted object using the coordinates from Coco SSD along with a label at the top of the boxes.

  6. To style the boxes and label correctly, add the following code to the index.css file:
    .feed {
      position: relative;
    
      p {
        position: absolute;
        padding: 5px;
        background-color: rgba(255, 111, 0, 0.85);
        color: #FFF;
        border: 1px dashed rgba(255, 255, 255, 0.7);
        z-index: 2;
        font-size: 12px;
        margin: 0;
      }
    
      .marker {
        background: rgba(0, 255, 0, 0.25);
        border: 1px dashed #fff;
        z-index: 1;
        position: absolute;
      }
    
    }

    This completes the development of the app. You can now restart the dev server to test the application. Here’s what it should look like when completed:

    A GIF showing the user running the app, allowing camera access to it, and then the app showing boxes and labels around detected objects in the feed.
    Demo of the real-time object detection using webcam

You can find the complete code in this GitHub repository.

Deploy the completed app to Kinsta

The final step is to deploy the app to Kinsta to make it available for your users. To do that, Kinsta allows you to host up to 100 static websites for free directly from your preferred Git provider (Bitbucket, GitHub, or GitLab).

Once your git repository is ready, follow these steps to deploy your object detection app to Kinsta:

  1. Log in or create an account to view your MyKinsta dashboard.
  2. Authorize Kinsta with your Git provider.
  3. Click Static Sites on the left sidebar, then click Add site.
  4. Select the repository and the branch you wish to deploy from.
  5. Assign a unique name to your site.
  6. Add the build settings in the following format:
    • Build command: yarn build or npm run build
    • Node version: 20.2.0
    • Publish directory: dist
  7. Finally, click Create site.

Once the app is deployed, you can click Visit Site from the dashboard to access the app. You can now try running the app across various devices with cameras to see how it performs.

As an alternative to Static Site Hosting, you can deploy your static site with Kinsta’s Application Hosting, which provides greater hosting flexibility, a wider range of benefits, and access to more robust features. For example, scalability, customized deployment using a Dockerfile, and comprehensive analytics encompassing real-time and historical data.

Summary

You’ve successfully built a real-time object detection application using React, TensorFlow.js, and Kinsta. This enables you to explore the exciting world of computer vision and create interactive experiences directly in the user’s browser.

Remember, the Coco SSD model we used is just a starting point. With further exploration, you can delve into custom object detection using TensorFlow.js, allowing you to tailor the app to identify specific objects relevant to your needs.

The possibilities are vast! This app serves as a foundation for you to build more detailed applications like augmented reality experiences or smart surveillance systems. By deploying your app on Kinsta’s reliable platform, you can share your creation with the world and witness the power of computer vision come to life.

What’s a problem you’ve come across that you think real-time object detection can solve? Let us know in the comments below!

Kumar Harsh

Kumar is a software developer and a technical author based in India. He specializes in JavaScript and DevOps. You can learn more about his work on his website.