Academic & Industrial Programs

From 2012 to 2022, fortunately, I took part in several artificial intelligence projects. These projects include the industrial and academic projects. The industrial related projects include my work in Alibaba Group, Ant Group, IBM, and my startup company. The academic projects include my projects which were finished in my undergraduate and graduate study, and the recent project which was held by Peking University Artificial Intelligence College.

2022.Feb – 2022.Sept

1. Multi-Agents Reinforcement Learning Benchmark | Academical

Status: Research Assistant

Organization: Peking University

Supervisor: Yang Yaodong, Assistant Prof, Computer Science, Peking University

Source code: https://github.com/Replicable-MARL/MARLlib

Document: https://marllib.readthedocs.io/en/latest/handbook/intro.html

Because the assortment of environments and algorithms was proposed in MARL(Multi-Agent Reinforcement Reinforcement Learning) fields recently, we want to build and propose a benchmark of current salient algorithms in different environments. We build a unified framework that could support various environments and algorithms and could run on distributed computing environment on default. In this project, I am responsible for:

Implement MAPPO, HAPPO, TRPO HATRPO, MATRPO algorithms based on Ray Rllib framework. These algorithms cloud run directly in a distributed environment.
Implement the unified part of these algorithms, which could run directly by configuration in different environments.
Fitting these algorithms to Multigent-Agent Mujoco and Start Craft SMAC environment.
MAPPO, HAPPO, TRPO, and HATRPO, MATRPO could reach the original papers' proposed performance, although in our project defined framework and distributed environment.

2. Customer Separation and Behavior Prediction System | Industrial

Status: Vice President, Partner

Type: Business Intelligence

As the biggest Chinese adult technology learning company, the company I worked for had over 100,000 customers scanning or looking over our introduction materials monthly. The sales staff would connect with the customers and make the transaction closed. Because the staff and time are limited, it is very crucial to separate the customers into different categories and recognize the potential customers.

Unlike academic projects, this project was so attached to the transaction that and needed several departments (including marketing, sales, customer delivery, and service center) to work together. In this project, we use 2 million customers' information. As the partner and vice president of this company, I was responsible for taking different actions to the categories we classified, which could finally improve the total return on investment.

Based on the historical purchase records and users’ actions, we could build the hypothesis model for evaluating the possibility of customers buying. By defining the salient features and the purchase possibilities, we could get the customer clusters. Then, because of the time series information of the past customers, we could analyze and predicate the customers’ status transition and mine the most efficient policy for different customers. Finally, this system could assign tasks for each salesman. Based on these tasks, the salesman could know the priority of the customers to connect every day and the specific actions they need to take for each customer.

With the help of this intelligent system, in 2021. Feb, each salesperson can close $20,000 worth of deals, but in 2021. August, each salesperson can close $50,000 worth of deals. What’s more important is, because of the accurately long-term customer maintenance, the ratio of marketing spend to revenue rose to 1:4 from 1:2 previously. Which helped the company close an RMB 650 million (0.1 billion USD) financing in August 2021.

2020.Aug - 2021.Sept

2018.Oct - 2019.May

2018.Mar - 2018.Aug

3. PCB automated manufacturing system | Industrial

Status: Data Scientist

Organization: IBM China & Fastprint.Inc

Type: Industrial

This project is to build for Fastprint.Inc, which is the largest Chinese PCB(Printed Circuit Board) manufacturing company located in Guangzhou.

These PCB companies are essentially subcontractors for the demand side around the world. They accept almost any request from users to be able to acquire customers as quickly as possible.

These companies’ productions are based on myriad and unstructured requirement documents. The formats, files type, and description ways with no standard at all. The companies need about 200-400 engineers to read and understand the content of the required documents. After they read and understand the process, they need to convert the required documents to their company standard format requirement document.

However, the laborious work for manual recognition becomes the bottleneck when a PCB company wants to improve its production amounts. There are a lot of works that need to be done, but training the engineers need over two years. The owner of Fastprint company wants to use artificial intelligence methods to improve the working procedure, to increase working efficiency.

In this project, I was the tech leader and lead data scientist.

Responsible for defining the problem scope and metrics with customer managers.
Responsible for the technology map road and tasks of team members.

Responsible for the text document intelligent parser algorithm. Based on the semi-supervised learning, word embedding similarity, sentence syntax and semantic parser, entity spatial relationship analysis.
Using computer graphic knowledge and artificial intelligence methods to parse PCB Gerber files, which are described graph files representing PCB elements and relationships. Through the parser, the system could understand the requirement of graphic representation.
Build the automated procedure pipeline of PCB intelligent manufacture.

After our work, we got the following achievements:

The related workers reduce the manual workload by nearly 70%;
Finish this project, Fastprint.Inc paid 12million RMB to IBM;
The project was selected as a key project in the Greater Bay Area by the Guangzhou Government

4. Intelligent Chatbot for China Construction Bank | Industrial

Status: Data Scientist

Organization: IBM China & China Construction Bank

Type: Industrial

This Project is to build a service chatbot to manage bank customers to complete some basic transactions for China Construction Bank, which is one of the largest banks in China. The robot could process regular transactions such as registering cards, closing accounts, transactions, and policy-related question answers.

In this project, I am responsible for:

Implementing the algorithm and framework for the intention classification, building models, and classifying the questions into different categories, such as a daily, bank, life service, etc.
Build the kernel rank algorithm, which calculates the similarity of the customer’s query and the saved query in the database.
Using syntax tree and word2vec to generate new questions based on the customer’s questions.
Refactoring the existing system to make the response time faster.

After half one year of working, this project reached the following achievements:

Complete this project successfully.
The intention classification accuracy is up to 95%.
The dialogue QA accuracy is up to 90%.
Optimize the previous system whose response time is 4 – 5 seconds to less than 0.5 seconds.

5. Potential Risk Works Mining | Industrial

Organization: Ant-Group (Alipay)

Status: AI Algorithm Engineer ( Internship)

Type: Industrial

In this project, I am responsible for:

Implementing the algorithm and framework for the intention classification, building models, and classifying the questions into different categories, such as a daily, bank, life service, etc.
Build the kernel rank algorithm, which calculates the similarity of the customer’s query and the saved query in the database.
Using syntax tree and word2vec to generate new questions based on the customer’s questions.
Refactoring the existing system to make the response time faster.

After half one year of working, this project reached the following achievements:

Complete this project successfully.
The intention classification accuracy is up to 95%.
The dialogue QA accuracy is up to 90%.
Optimize the previous system whose response time is 4 – 5 seconds to less than 0.5 seconds.

7. Chinese News Auto Summarization | Industrial

Datetime: 2017.Feb – 2018.Oct

Status: AI Algorithm Engineer ( Internship)

Organization: Alibaba-Group & Xinhua News Agency

Type: Industrial

In 2017, Xinhua News Agency, the largest newswire company in China, cooperated with Alibaba wants to solve several complicated problems. I was an AI Algorithm Engineer Intern in that team and was assigned to solve the Chinse news auto summarization problem. In this project, I need to design an algorithm that could convert arbitrary long news to a limited shorter article. The problem faced several challenges:

Although there had been some outstanding supervised auto summarization models published at that time, there was no supervised Chinese model, even no Chinese auto summarization training corpus at all.
The news lengths are in a large range, from hundreds to thousands, and the output is just limited to 200 or less.

This project was mainly inspired by Mikolov word2vec and SIF Princeton methods. This work combined several unsupervised learning methods, deep learning, and graph algorithms to implement an abstractive auto summarization system, which could convert any length article to a length shorter than 200 words article.

In this project, I am responsible for:

Implement sentence embedding method to judge sentence semantic similarity.
Using sentence embedding to extract the main sentences.
Design a neighbor smooth algorithm to make the result more readable and fluently.
Using Keywords detection, NER, and dependency parsing method to get the man sentences more accurately.

After several months working, I reached the following achievements:

Finished the project successfully, and this algorithm was used in two smart sound box which is, Tmall Genie and Rokid, to broadcast news.
Based on the product manager’s evolution and customer feedback, the performance of this algorithm is one of the tops in China in 2017.

2018.Mar - 2018.Aug

2017.Feb - 2018.Oct

2017.July - 2018.Mar

2016.June - 2019.July

2013.Dec - 2014.May

2012.Mar - 2013.May

8. Deep Generative Model for Auto Composition by Lyric | Academical

Status: Graduate Student

Type: Academic Research (todo: need graph)

Supervisor: Prof Zhang Kejun, Zhejiang University, Computer Science

Github Source：

Inspired by the text generation and machine translation success in 2017, in this project, I wanted to build a model that could receive the lyrics and generate emotionally related and euphonious music.

Based on the embedding learning method, I got the tiny music span spatial high dimension vectors. And build a sequence-to-sequence model, like translation models. By the architecture design and fine-tuning for about two months. This model could produce euphonious and text-emotional related music.

In this project, I am responsible for the whole project from beginning to end, which includes the following parts:

Implement the music element embedding algorithm and embedding training process.
Implement the specifical seq2seq and attention model for music generation.
Implement the web spider for getting the text lyric and music files.
Implement the whole process for a lyric input and music generation pipeline.

After about half one year of work, the model could generate some recognizable music from given texts. And the work is continuing in the NeXT lab at Zhejiang University now.

The generated music sampling: https://github.com/fortyMiles/music_embedding/tree/master/dataset

9. Knowledge Graph and Relation Mining for Aerosol Data | Academical

Status: Graduate Student

Type: Academic Research

Supervisor: Prof Zhang Kejun, Zhejiang University, Computer Science

This research project was held by Knowledge Graph Lab, Computer Science College, Zhejiang University. Based on the papers about aerosol data and optical instrument data, get the crucial relation between the aerosol information and eight key features. In this project, I am responsible for:

Implementing Tran-E algorithm to get the relation entity
Implementing the Word2Vec algorithm to get the Word Semantic Similarity
Using Dependency Parsing to get the object-predicate-subject relation
Using Regular Expression and Text Parsing Method to get the table information for PDF files.

10. Network Society Public Opinion Distribution and Trend Mining | Academical

Status: Undergraduate Student

Type: Academic Research

Supervisor: Prof Zhan Jian, Lanzhou University, Computer Science

This research project was held by Institute for Information, Lanzhou University, and the Pennsylvania State University. Based on the Machine Learning and NLP method, mining the opinion distribution and opinion evolution trend for a given affair. In this project, I am responsible for:

Implement Sentence sentimental representation
Implement and choose different machine learning algorithms (Bayesian Classification, SVM, KNN, K-means) to implement the opinion classification and cluster.
Build graphs to record the analysis of the different persons and group’s opinions;
Using D3 to visualize the theme river and opinion group analysis.

Achievement: The research output paper Theme-River-Based Internet Public Opinion Visualization Correlation Analysis Methods was published on Information and Documentation Services, which is CS-SCI indexed.

11. A Web-Based Campus Service Robot Dialogue System | Academical

Status: Undergraduate Student

Type: Academic Research ( todo: need graph)

Supervisor: Prof Majun, Lanzhou University, Computer Science

In this project, I implemented a campus chatbot to help students. This program was highly recommended by my university to the China Ministry of Education. In Dec 2012, as the only recipient from Lanzhou University, I received a 25,000 RMB research innovation and entrepreneurship award from the Chinese Ministry of Education.

This chatbot could support student campus information QA, such as commute buses and library book information retrieval. Besides, the chatbot could help students' daily life, such as ordering takeout and sending cellphone texts by oral instructions automatically. Also, this chatbot supports common daily conversation.

In fact, this is my first comprehensive project. At that time, because python2.X didn’t support Chinese characters processing efficiently, I evenly needed to write the web spider on my own. To work on several information resources, I built a web spider framework to get the necessary information. Moreover, because efficient deep learning methods hadn’t been come up with at that time, I needed to design sophisticated NLP algorithms from scratch.

In this project, I was responsible for:

Collecting corpus and standardized text data;
Implement the sentence semantic similarity algorithm;
Implement the quick retrieval system;
Implement the auto-learning method;

After more than one year of working, I reached the following achievements:

Won the first prize in Lanzhou University Student Innovation and Entrepreneurship Competition.
This program was recommended to the Ministry of Education of PRC.
This program became the only team awarded by the ministry among the Lanzhou University recommended projects. I received 25,000 RMB from the Ministry of Education of PRC, which exceeds the highest scholarship amount awarded by the university that year.
More importantly, this encouraged me to look forward to the deeper artificial intelligence learning future.