Improving Technical Management
Improving the speed of relevance of technical knowledge search
A global engineering company that helps automotive, aerospace, energy, and consumer products producers turned to Intuceo to improve its technical knowledge management. The client required an algorithm that could quickly yet accurately search unstructured data in k-pacs.
The solution, to be integrated into the client’s enterprise knowledge system, needed to resolve issues in the data and score documents by relevance to search query. Its present system took too long and failed to yield the expected results. A major challenge was to integrate its existing engineering workflows.
The client provided 94,344 documents, 10,000 of which were used to develop the model. Our DataSharpTM preprocessor merged, cleaned, and categorized the text, and a semantic dictionary, including frequency of occurrence, was built. The Intuceo team stemmed related terms, removed unnecessary punctuation, replaced original content with dictionary terms, constructed a tf- idf matrix and filtered out terms that did not occur frequently across documents. K-means clustering used, with cosine similarity as the distance measure (see the figure), to obtain the most similar documents; in this way, non- quantitative data was numerically tagged and the “distance” (i.e., degree of likeness) among documents compared.
The solution lists the 10 most similar documents based on a search query. The total search time has been reduced from minutes to seconds and relevance has been optimized. The client has knowledge on demand that increases the productivity and capabilities of its engineers, ensures a consistent engineering process, and prevents recurring errors.DOWNLOAD NOW