Service

Data Collection

According to the detailed requirement, the data will be collected from different types of text, pictures as well as the different environments of speech and video. The tasks such as content filtration and textuality can be realized during data collection. Sample collection can be loaded over tens of millions; over 100 thousand sample collection can be finished within a day. It adapts the requirement of different and complex data collecting environments, volume crowd-sourcing users’ offline data collection, provides source data for customers.

  1. 1. Text data collection: advertisement, magazine, newspaper, textbook, etc.
  2. 2. Picture data collection: entity picture, people picture, environment picture, etc.
  3. 3. Speech and video data collection: local language, special environment speech, foreign language, reference picture, etc.
  4. 4. O2O/LBS data collection: shop information, bus stop board, different detailed information, etc.

Text Collection

Tuosi provides customers various text corpus data collecting services with a well-experienced multilingual team of experts and technique experts’ network. The context of text corpus data collection consists of technology, life, work, entertainment, etc.; its form includes news, blog, bbs, Weibo, conversation design, academic journal, business certificate, etc. The requirement of technique modeling researches such as speech composition, speech identification, natural language processing, AI, etc. can be satisfied.
According to the detailed requirement, the data will be collected from different types of text, pictures as well as different environments of speech and video. The tasks such as content filtration and textuality can be realized during data collection. Sample collection can be loaded over tens of millions; over 100 thousand sample collection can be finished within a day.

Picture Collection

Tuosi provides different types of graph and picture data collecting and marking services to meet the research and development requirement of human-computer interaction and pattern recognition technology, such as face recognition, facial expression recognition, handwriting recognition, gesture recognition, body sense recognition, and machine vision, etc.
The types of data collection include body, face, expression, handwriting, behavior trace, map location, image, graphic symbol, specific scene, and other data.

1.Handwrite Collection
Tuosi can collect and process more than 30 kinds of language, various types of data, handwriting styles, and platform equipment.
The language of handwriting collection: traditional Chinese simplified Chinese, English, Japanese, Korean, Russian, Arabic, Spanish, German, Portuguese, Italian, French, and more than 30 languages.
Handwriting data type: words, signatures, characters, chemical symbols and special symbols, etc.
Handwriting style: natural, neat, etc.
Platform equipment: cellphone, laptop, handwriting board; Android\Windows\IOS, etc. can be accepted.

2.Face & Expression Collection
From the perspective of facial expression, there are more than 1,000,000 face expression collections and processes have been provided to customers in modeling training and testing such as face identification, expression identification, human-computer interaction, etc.
Face collecting areas: Europe, America, Asia, Africa, and the most area around the world.
The age and gender of face collection: all genders and most age section.
Face expression type: happy, natural, panic, sad, annoying, angry, and surprising, etc.
The environment of face collection: professional studio, office, public area, home, etc.
The lighting conditions of face data collection: different lighting conditions outside and inside, inside the vehicle, etc.

3.Map information collection
In the terms of map track, GPS information, POI information and satellite information will be provided to customers.
GPS information: taxi, bus, pedestrian, etc.
POI information: restaurant, shop mall, hotel, school, cinema, etc.
Satellite information: entity profile, contact information, special remarks, etc.

4.Body and gesture collection
In the point view of body and gesture, there are more than 500,000 body sense data has been provided to customers in modeling training and testing. Meanwhile, Tuosi has accumulated various knowledge and experiences such as body behavior definitions, gesture intentions.
Body type: natural body, specific body, specific gesture, etc.
Body sense mode: infrared, image, magnetic sense, laser scanning, etc.
Platform equipment: cellphone, professional camera, computer, etc.

5.Other data collection
According to customer’s requirements, Tuosi also provides the collection of physical images, graphical symbols, specific scenarios, and other data.

Video Collection

To the field like pattern recognition and behavioral research, Tuosi aims to satisfy the requirement of processing pictures and videos, provides data transcribing, marking, and processing based on the customer requirements, which consist with the video and speech collection of different type of media, environment, theme, equipment platform such as MP3, MV, network multimedia data, video meeting data, family video, and speech data.

Speech Data Collection

Tousi already has collected more than 110 kings of languages from more than 70 counties and areas that covered Asia, America, Africa, Europe, etc. Additionally, the kinds of languages are still increasing to deliver the best customer experience

Service types
Tuosi offers various types of speech data collection, it includes:

Speech composition data collection
Speech composition data collection (hidden horse algorithm)
Speech composition data collection (splicing algorithm), etc.

Speech identification data collection
Car speech data collection
Telephone speech data acquisition (mobile phone/fixed-line)
Free speech voice data collection
Broadcast speech data acquisition
Desktop speech data collection
Emotional speech data acquisition
Other special microphones and embedded devices, multi-modal speech data collection, song singing data collection

Application filed
Speech composition, speech recognition, speaker recognition, speech evaluation, emotion recognition, and music retrieval, etc.

Covered language
Chinese (Mandarin, Hong Kong Chinese, Taiwan Chinese, the regional dialect in Chinese heavy accent), Tibetan, Mongolian, Yugur, Spanish (Spanish in Spain, Mexico, the United States in Spanish, etc.), French (Canadian French, French, etc.), English (American English and British English, Australian English, China English, Japanese English, etc.), Arabic, etc. and more than 110 languages.

Different environment of speech data collection service
Professional recording studio: recording studio, silence room, echo chamber, etc., suitable for speech composition data collection;
Indoor different noise environment: office, home, supermarket, cafe, restaurant, shopping mall, etc., suitable for the data collection of speech recognition and pronunciation recognition;
Outdoor different noise environment: street, park, bus, subway, square, etc., suitable for the data collection of speech recognition and pronunciation recognition in various Environments;
Vehicle-mounted: parking, urban road driving at different speeds, highway driving, etc., suitable for vehicle speech recognition data collection and other special definition environments.

Platform system
We provide multiple OS systems such as Android\Windows\IOS, sampling rate, and speech data collection service for recording channels, which are based on the computer, smart pad, and cellphone.

Sampling rate and recording channel
Commonly used with: 8K,16Bit;16K,16Bit;22K,16Bit;44K,16Bit;48K, 16Bit

Scene style
Reading, natural speech, dialogue speech, conference speech, emotional speech, speech, singing speech, multi-modal speech, Script speech, supervised speech, and various types of speech collection service.

contact