Data for computer vision

General recommendations for data requirements in computer vision
Data for computer vission
To get computer vision to work on your specific problem, the key is to provide problem-specific data to train the model on. One needs to either figure out if a relevant dataset already exists, how to create a dataset using your own images and label them correctly or find a relevant public dataset.

  • If a dataset already exists we can coordinate on how to best structure the files for ease of use.
  • If the creation of a dataset is needed, Desupervised help by providing software to do the labeling with and details on what data format it needs to be in. But Desupervised does not at this time provide a service where we do the labeling itself.
  • If a public data set is the only option, we can help by researching what is available that is suitable for the problem, but in most cases, a customer-specific dataset is the best option in terms of model quality.


A common question is how much data is needed, it is a tricky question with an unsatisfactory answer: it depends. It depends on if we can use an already trained network as a starting point or not. In most cases, we can, and that will drastically reduce the amount of data we need. But the number of examples needed will then depend on how similar your data is to the original data the model was trained on. That being said, one can normally get good production-ready performance with a few hundred images.