TexSmart HTTP API

En | 中

TexSmart HTTP API

The API supports access via HTTP-POST and its url is https://texsmart.qq.com/api
The input post-data needs to be in JSON format, and the output is also in JSON format. It is recommended to use Postman for testing.

Here is a simple example request (as HTTP-POST body):

  {"str":"He stayed in San Francisco."}

Example codes for calling the API: Python Code | Java Code | C++ Code | C# Code

The results returned after calling the API is also in JSON format as follows:

{
	"header":{"time_cost_ms":1.18,"time_cost":0.00118,
				  "core_time_cost_ms":1.139,"ret_code":"succ"},
	"norm_str":"He stayed in San Francisco.",
	"word_list":[
		{"str":"He","hit":[0,2,0,1],"tag":"PRP"},
        {"str":"stayed","hit":[3,6,1,1],"tag":"VBD"},
        {"str":"in","hit":[10,2,2,1],"tag":"IN"},
        {"str":"San","hit":[13,3,3,1],"tag":"NNP"},
        {"str":"Francisco","hit":[17,9,4,1],"tag":"NNP"},
        {"str":".","hit":[26,1,5,1],"tag":"NFP"}
	],
	"phrase_list":[
		{"str":"He","hit":[0,2,0,1],"tag":"PRP"},
        {"str":"stayed","hit":[3,6,1,1],"tag":"VBD"},
        {"str":"in","hit":[10,2,2,1],"tag":"IN"},
        {"str":"San Francisco","hit":[13,13,3,2],"tag":"NNP"},
        {"str":".","hit":[26,1,5,1],"tag":"NFP"}
	],
	"entity_list":[
		{"str":"San Francisco","hit":[13,13,3,2],"tag":"loc.city","tag_i18n":"city",
		"meaning":{"related":["Los Angeles", "San Diego", "San Jose", "Santa Clara", "Palo Alto", 
				      "Santa Cruz", "Sacramento", "San Mateo", "Santa Barbara", "Oakland"]}}
	],
	"syntactic_parsing_str":"",
	"srl_str":""
}

Note that, the field “header” gives some auxiliary information (time cost, error codes, etc.) which explains this API call; the field “norm_str” gives the result of text normalization; the field “word_list” contains the results of basic-granularity word segmentation and part-of-speech tagging; the field “phrase_list” is the word segmentation and part-of-speech tagging of compound granularity; the field “entity_list” gives all the recognized entities and their types; and the fields “syntactic_parsing_str” and “srl_str” represent constituent syntax tree and semantic role labeling results respectively. In this example, the field “syntactic_parsing_str” and “srl_str” have both empty strings, because “syntactic_parsing” and “srl” are not activated by default.

Instructions on Input and Output Format

An introduction of each field of the input JSON object is as follows:

Field Name	Data Type	Field Introduction
str	string	The input text to be analyzed
options	JSON Object	option information, mainly used to assign functions to be called and algorithms to be used for functions. More details can be found in the section of "More Ways to Call".
echo_data	JSON Object	The JSON object can be defined by users and the TexSmart service will return the same object in the way of echo. users can use this field to record the identity information of the current request.

The introduction of each field of output JSON object is as follows:

Field Name	Data Type	Field Introduction
header	JSON Object	Auxiliary information returned after calling and execution time_cost_ms field: the total time for processing request calculated with millisecond (ms). time_cost field: the total time for processing request calculated with second (s). ret_code field: return code. "succ" denotes success; others are false codes which include the following cases: error.invalid_request_format: the request format is invalid, for example, it is not a JSON format; error.timeout: time out error.busy: the service is busy (it is handling other requests) error.too_long_text: the input sentence is too long（the lenght limit is 8192 characters）
norm_str	string	Normalization results of the input sentence
word_list	JSON Object	Results of the basic-granularity word segmentation and part-of-speech tagging hit field：its value is a JSON array, whose first number denotes the position of the word within norm_str, second number denotes the length of the word, and the last two numbers can be neglected. Position and length are computed in terms of character rather than byte, such as a Chinese character, digit number, puncutation, space. tag field：POS tag of the word.
phrase_list	JSON Object	Results of compound granularity word segmentation and part-of-speech tagging (The meaning of all fields is the same as that in word_list)
entity_list	JSON Object	Information of the recognized entities hit field: Same as that in word_list type field: Its value is a JSON object, with the following fields: name: Standard name of this entity type; i18n: Natural language expression (Chinese or English) of this type; flag: Indicating whether this entity mention is an instance or a sub-type (1: instance, 2: sub-type, 0: unknown）; this field can be absent when flag = 0. path: Path of this type in the TexSmart ontology (from root to the direct super-type). The TexSmart ontology can be downloaded from the download page. meaning field: the semantic information of entity. It is denoted by a JSON object and its specific format is dependent on the entity type. tag field: [to be expired and please use type field instead] the standard name of entity type. tag_i18n field: [to be expired and please use type field instead] the natural language expression (Chinese or English) of entity type.
syntactic_parsing_str	string	Results of constituent syntactic analysis
srl_str	string	Results of semantic role labeling

More Ways to Calling the API

Input Option Settings

More generally, the input JSON can also include some options, including word segmentation, part-of-speech tagging, named entity recognition, syntactic analysis, semantic role labeling and other functions, as well as what algorithm is called to execute a function. There is no options in the input JSON of the simple example above, where results returned are similar to the results of the following json-format input.

{
  "str":"he stayed in San Francisco.",
  "options":
  {
    "input_spec":{"lang":"auto"},
    "word_seg":{"enable":true},
    "pos_tagging":{"enable":true,"alg":"log_linear"},
    "ner":{"enable":true,"alg":"crf","fine_grained":true},
    "syntactic_parsing":{"enable":false},
    "srl":{"enable":false}
  },
  "echo_data":{"request_id":12345}
}

Specifically, the field “input_spec” represents the type of input language, which has three available values, recognizing the input language automatically (“auto”), Chinese (“chs”) and English (“en”) respectively. The field “Enable” can be “true” or “false”, representing whether to activate the corresponding function. The field “alg” represents the algorithm that the corresponding function needs to call. There are three alternatives for “alg” in “pos_tagging” (“crf”,“dnn”and “log_linear”), and two alternatives in “ner” (“crf”and “dnn”). The field “fine_grained” represents whether to return the results of fine-grained NER. The value of “echo_data” can be customized by the user, for example, the user can record the identity information of the current request by it, such as “request_id”, which may be useful in asynchronous calls and some other occasions.

Batch Call

TexSmart also supports API for batch calling: through a JSON-format input, it can analyze multiple (Chinese/English) sentences. Here is an input example with its JSON format:

{
  "str":[
         "上个月30号，南昌王先生在自己家里边看流浪地球边吃煲仔饭。",
         "2020年2月7日，经中央批准，国家监察委员会决定派出调查组赴湖北省武汉市，就群众反映的涉及李文亮医生的有关问题作全面调查。",
         "John Smith stayed in San Francisco last month."
        ]
}

Note that the output format of batch call is slightly different from that of the ordinary call, and the result of all sentences is a JSON array which is used as the value of "res_list" field.

Code Example

API Call with Python

Code Example 1（wotjhttp.client）：

# -*- coding: utf8 -*-
import json
import http.client


obj = {"str": "he stayed in San Francisco."}
req_str = json.dumps(obj)

conn = http.client.HTTPConnection("texsmart.qq.com")
conn.request("POST", "/api", req_str)
response = conn.getresponse()
print(response.status, response.reason)
res_str = response.read().decode('utf-8')
print(res_str)
#print(json.loads(res_str))

Code Example 2（with requests module installed）：

# -*- coding: utf8 -*-
import json
import requests

obj = {"str": "he stayed in San Francisco."}
req_str = json.dumps(obj).encode()

url = "https://texsmart.qq.com/api"
r = requests.post(url, data=req_str)
r.encoding = "utf-8"
print(r.text)
#print(json.loads(r.text))

API Call with Java

[TBD]

API Call with C++

[TBD]

API Call with C#

[TBD]

About TexSmart

TexSmart is a text understanding system built by the NLP Team at Tencent AI Lab, which is used to analyze morphology, syntax and semantics for text in both Chinese and English. It provides basic natural language understanding functions such as word segmentation, part-of-speech tagging, named entity recognition(NER), semantic expansion, and particularly supports some key functions including fine-grained named entity recognition, semantic expansion and deep semantic expression for specific entities.

TexSmart Demo | Main Document Page