Seven masters on the status quo and pain points of human-computer interaction and terminal intelligence

This article was first published on the WeChat public account: New Wisdom. The content of the article belongs to the author's personal opinion and does not represent the position of Hexun.com. Investors should act accordingly, at their own risk.

1 New wisdom yuan finishing: Zhang Yishu Chang

[New Zhiyuan Guide] What role does the new generation of human-computer interaction technology play in terminal intelligence? What are the challenges of semantic interaction technology and smart speaker technology? Can China launch a speaker that can dominate the market or is at least widely accepted by consumers? At the closed forum of the June 100th People's Association held by Xinzhiyuan and the Android Green Alliance and the Institute of Automation of the Chinese Academy of Sciences, the profound explanations and thinking collisions of many academic and industrial experts may bring you some inspiration. What technologies are related to smart terminals? What role does the new generation of human-computer interaction technology play in it? What are the challenges of voice interaction technology? What is the development trend of the next stage of artificial intelligence? What value should smart home control provide to users? Can you make an ideal smart speaker by stringing together the technologies of voice recognition and microphone array? Can China launch a speaker that can dominate the market or is at least widely accepted by consumers?

In order to try to answer these questions, Xinzhiyuan invited a number of academic and industrial experts from the June, 100-member closed-door forum jointly organized by the Android Green Alliance and the Chinese Academy of Sciences Automation, from technology, application, difficulty, value, and business. Modes, prospects and other aspects explore human-computer interaction and terminal intelligence issues, and strive to enable participants to have a comprehensive understanding of the development and trends of the new generation of human-computer interaction, and get some inspiration.

Experts involved in the discussion include

Sorted by expert speech, the same as below:

Zhang Baofeng, Huawei CBG Software Engineering Department VP, Terminal Smart Engineering Department

Tao Jianhua, Deputy Director, State Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences

Zhao Feng, Vice President and CTO of Haier Home Appliances Industry Group

Sun Fuchun, Deputy Director, State Key Laboratory of Intelligent Technology and Systems, Tsinghua University

Huang Wei, co-founder and CEO of Yunzhisheng

Ding Yi, co-founder of the spirit

Cheng Wei, Director of Innovation Incubation, Microsoft Asia Pacific R&D Group

The 100-member will be hosted by Ms. Yang Jing, the founder of Xinzhiyuan.

[New Wisdom 100 People's Association] Seven masters talk about the status quo and pain points of human-computer interaction and terminal intelligence


Ms. Yang Jing was the Deputy Director of Media Purchasing and Consulting of Zenith Media (2002-2010) and China Economic Net Business Consultant (2010-2014). In 2014, he planned and hosted a series of artificial intelligence and big data theme seminars such as “Singularity Approaching”, “Algorithm Empire” and “Social People and Robots in the Age of Big Data”. In March 2015, he co-hosted the “New Smart Times Forum” with the Machinery Industry Press, and was invited to be the Intelligent Social Technology Expert Forum of the 2015 China Association for Science and Technology, the 2015 Robot World Cup Industry Summit, and the World Robotics Conference. The new era of robots is the sub-forum host. In September 2015, Xinzhiyuan was founded. In March 2016, the monograph “Xinzhiyuan Machine+Human=Super Intelligent Age” was published. In October 2016, the World Artificial Intelligence Conference was jointly hosted and the “China Artificial Intelligence Industry Development Report” was published.

After Ms. Yang Jing gave a welcome speech, Zhao Hong, the director of the three-party testing department of Huawei CBG Software Engineering Department and the representative of Android Green Alliance, also gave a warm welcome speech.

Zhang Baofeng: AI scare index and the three pain points of the terminal smart future


Zhang Baofeng: AI scare index and the three pain points of the terminal smart future

Zhang Baofeng, Huawei CBG Software Engineering Department VP, Terminal Smart Engineering Department, responsible for the development and delivery of terminal AI software. He used to be the deputy director of Huawei Noah's Ark Lab, responsible for medium and long-term technical research in the field of data science. His research interests include data mining, machine learning and artificial intelligence. Member of China Nuclear High-Level Expert Group and member of China CCF Big Data Expert Committee.

Zhang Baofeng joined Huawei in 1998 and has over 18 years of working experience in the field of information technology. He has extensive experience in international/national standards organization activities. He was a reporter of the fixed telecomstasis group of the 13th International Research Group of the International Telecommunication Union. Deputy head of the Working Committee on Switching Technology.

At the June 100 meeting, Zhang Baofeng explained the three major needs of the terminal intelligent future - understanding users, active services, lifelong learning, and three major pain points - end-side intelligence, product line measurement, deep learning. He said: "For the smart future of mobile terminals, let me talk about my cognition and understanding. You can see what is right and what is wrong." Perhaps his understanding and understanding is exactly A key to understanding the direction of the terminal intelligent industry. This speech + PPT share, you can click on the "AI scare index and the terminal wisdom of the future three major pain points" view.


At the June 100 meeting, Zhang Baofeng explained the three needs of the terminal intelligent future - understanding users, active services, lifelong learning, and three major pain points - end-side intelligence, product line measurement, deep learning. He said: "For the smart future of mobile terminals, let me talk about my cognition and understanding. You can see what is right and what is wrong." Perhaps his understanding and understanding is exactly A key to understanding the direction of the terminal intelligent industry. This speech + PPT sharing, you can click on the "AI scare index and the terminal wisdom of the future three major pain points" view.

Tao Jianhua: Voice interaction technology will be one of the most important access methods for mobile terminals.

Tao Jianhua, Ph.D., researcher, doctoral tutor. National Outstanding Youth Fund winner. He is currently the deputy director of the State Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He received his bachelor's and master's degrees in electronics from Nanjing University in 1993 and 1996, and his Ph.D. in computer science from Tsinghua University in 2001. He is currently serving as IEEE Trans. on Affective Computing Steering Committee Member, Vice President of ISCA SIG-CSLP, Executive Director of HUMAINE Institute, Executive Director of China Computer Society, Director of China Artificial Intelligence Society, Director of China Chinese Information Society, Director of China Acoustics Society, Responsible for the post of Secretary General of the Language Resources Construction and Management Committee of the Chinese Information Society. He has been responsible for and participated in more than 20 national-level projects (863 Key, National Natural Science Foundation, Development and Reform Commission, and Ministry of Science and Technology). He has served as a National Natural Science Foundation and 863 National Project Evaluation Expert. He has published more than 150 papers in SCI or EI journals or conferences, applied for 15 domestic invention patents, 1 international patent, and edited 2 academic works. The research results have won many awards at important academic conferences at home and abroad, and won the second prize of Beijing Science and Technology Progress Award twice. Member or chair of the program committee at prestigious academic conferences at home and abroad, including ICPR, ACII, ICMI, IUS, ISCSLP, NCMMSC, etc. He is currently a member of the Journal on Multimodal User Interface and the International Journal on Synthetic Emotions.

Artificial Intelligence 2.0 Five core technologies


Artificial Intelligence 2.0 Five core technologies

Under the big concept of artificial intelligence, there are still many directions to explore.

A brief review of the history of artificial intelligence, the history of artificial intelligence technology has had several climaxes, and there have been several lows. After 2010, the combination of artificial intelligence technology and deep neural networks has indeed brought us great opportunities. Especially in recent years, what kind of connotation does the artificial intelligence 2.0 mentioned in the industry contain? Artificial Intelligence 2.0 is a new generation of artificial intelligence based on major changes in information new environment and development of new goals, including new environments, new targets, scalable new technologies, and many changes have taken place in research objects. The most important point here is that big data intelligence, cross-media intelligence, autonomous intelligence, human-machine hybrid enhancement intelligence, and group intelligence are important tasks for future development. These constitute the five core technologies of artificial intelligence 2.0. ”

Attention mechanism, memory ability, migration learning, reinforcement learning, semi-supervised unsupervised learning are the main focus of future development of artificial intelligence technology. Now we mainly see deep neural network methods. We believe that in the future development of artificial intelligence technology, many new learning methods will still get a lot of attention, such as general artificial intelligence technology. Now I can't think about it in the past. Now I can make some preliminary explorations. It is difficult to solve this problem in a limited short period of time, but preliminary exploration is possible.

Looking at the problem, the intelligence for big data is generally of concern to everyone. Especially in the strategic layout of the country, cloud computing and big data are all arranged in an independent direction. This related work is easy to understand, especially supporting a range of applications like smart transportation and smart city.

Cross-media intelligence is a new research content in artificial intelligence technology. There is now more and more data on Internet multimedia. Between the terminal and the cloud, it is difficult to say where the specific boundaries are, and more and more deeper integration. Text, image, voice, and video interactive properties will be closely intertwined to form cross-media features. How to use the semantically connected content to more closely integrate this person's different cross-media information, this is the cross-media intelligence problem that needs to be solved in the future artificial intelligence. This has many applications in Internet applications, as well as in many security areas.

There are also human-machine hybrids to enhance intelligence. The boundaries between man and machine in the future are beginning to blur. Human-machine hybrid enhances intelligence, one can enhance people's own ability, on the other hand, it can make a more advanced agent through close cooperation between the machine and the machine.

In terms of group intelligence, a variety of different agents are mixed together to build a higher level of group intelligence, which will become a new focus.

Autonomous intelligent systems involve intelligent technology and there is a lot of work to be done.

Looking at the general development of artificial intelligence 2.0 from three levels

The general development of artificial intelligence technology in artificial intelligence 2.0, we are divided into three major aspects, one is the basic support level, the second is the key technology level, and the third is the application scene level.

At the basic knowledge level, including all the smart sensors and chips related to artificial intelligence technology, including artificial intelligence, whether it is an accelerated chip for deep learning or a sensor chip - the sensor chip is to cure the common perceptual algorithm into the chip. Go, there are data resources and a basic support system composed of software-supported platform software systems.

Key technologies include machine learning, and machine learning includes deep learning. We now believe that deep learning is already a traditional approach. At the same time, it also includes intensive learning, confrontational learning and so on, as well as other key technologies such as vision, voice, image, human-computer interaction, big data, cloud computing and so on.

In the field of application, it can be seen that artificial intelligence continuously penetrates into different fields. Its applications include robots, smart driving, drones and a series of smart terminals for wearable devices. Recently, it is generally concerned with smart medical care. Smart security, smart finance, smart industry, etc., artificial intelligence technology may produce some large or breakthrough application points.

Intelligent terminal related technology - augmented reality technology, three-dimensional sound field technology, voice interaction technology

The form of smart terminals is very diverse. In the past few years, we have helmets or smart glasses in addition to the usual mobile phones and PADs. In the past, smart terminals have shipped very large quantities at home and abroad, and the market is very large. From the perspective of the entire intelligent terminal, with the development of intelligent technology in recent years, it has shown an explosive development trend. New wearable smart terminals are rapidly evolving and changing people's lives.

Augmented reality

In smart terminals, there are some very interesting applications, such as augmented reality technology. At present, we believe that it may become one of the important applications in smart terminals. What is the purpose? I use different wearable smart terminals or smart terminals of mobile phones to collect surrounding scenes and superimpose corresponding information through the camera or by voice. The corresponding information constitutes a different interpretation of the surrounding scene, and even more may use such scene picture information for positioning. Do you think positioning requires image information? It can be done by GPS. In fact, it can be located indoors or where GPS cannot cover it. Augmented reality technology has a lot of room for development in the future.

Three-dimensional sound field generation technology

There is also an interesting work for mobile terminals called 3D sound field generation technology. We used terminals in the past. There are often many people who plug in bicycles or walk with headphones. The music they listen to is stereo, but the stereo is actually not stereo. It’s just the control of the volume of the left and right ears. To coordinate the expression of sound effects, we also call this stereo, but in fact it only solves the problem of a flat sound field. Is it possible to produce a true three-dimensional sound field while listening to music or watching a movie, and using a pair of headphones instead of a surround sound system. The surround sound system is equipped with a lot of speakers in an environment, can produce such an effect, can I achieve this effect with a pair of headphones? This is also a very interesting job. We have done quite a good demo for this work. We can make music and vocals better according to the 360-degree range of people, including up and down, left and right, and the listener feels it. In the front, it's behind and behind, it's very different from the normal stereo.

Voice interaction technology

In the past, we have always said that voice interaction technology will be one of the most important access methods for mobile terminals. Our mainstream interactive methods are nothing more than several ways, touch, keyboard input, handwriting, and voice. There have been many technological changes in voice interaction in recent years. Voice technology has achieved very good capabilities both in terms of its recognition rate and the performance of the surrounding sound field for noise reduction. The access of voice technology is becoming more and more market-oriented. In the past, when people do speech noise reduction, it is better to use a multi-microphone system in mobile phones to achieve more effective hardware noise reduction.

Now with the deep learning method, it is possible to achieve better sound noise reduction with a single microphone. The development of artificial intelligence technology has solved many problems in the past, making the technology of voice interaction more and more robust.

Even so, we still have a lot of work to do without further completion, and today it is proposed for everyone to think about. The most typical is the three-dimensional sound field problem. The three-dimensional sound field simulates the human ear. The human ear has an auricle. The auricle is definitely not a display. It is because of the existence of the auricle that the sound comes from the front or from the back. . The three-dimensional sound field builds the model of the auricle through the earphones. It varies from person to person, and everyone is different. Personalization is not well solved.

In addition, in terms of voice interaction, it has just been mentioned that the speech recognition synthesis technology has greatly improved the voice interaction performance. Looking closely, there is still a lot of work in it. The speaker's voice is not too free, and it is now a little stronger than in the past.

Although the current speech recognition system can achieve a good degree, the recognized sound can not be too colloquial; second, the personalized processing is still not strong enough. Multi-language mixed speech recognition is also an important difficulty.

From the perspective of the combination of mobile terminals and artificial intelligence technologies, artificial intelligence and mobile terminals actually contain more aspects. In this case, we have made some preliminary explorations. The new work is a combination of deep learning and a combination of large corpora, in order to characterize or generate deeper parameter information in the process of human-computer interaction. There is still a lot of work in it that needs further work.

Time reasons, not one by one, today's report is here.

The work done by mobile terminals, whether it is augmented reality, personalized 3D sound field, emotional voice interaction, or precise 3D visual interaction, is a very interesting application scenario for future mobile terminal development. It cannot be said that mobile terminals must have such a Some techniques, but this is indeed a very interesting application scenario. This includes a lot of work, such as working on data interfaces. Mobile terminals, with the voice interaction and visual interaction just mentioned, can be used for many purposes in smart home and mobile office scenarios.

Sun Fuchun: Is artificial intelligence the "third apple" that changes the world?

Sun Fuchun is a professor of computer science and technology at Tsinghua University, a doctoral tutor, a member of the Academic Committee of Tsinghua University, the director of the Academic Committee of the Department of Computer Science and Technology, and the executive deputy director of the State Key Laboratory of Intelligent Technology and Systems. He is also a member of the National 863 Program Expert Group, a member of the National Natural Science Foundation's Major Research Program “Cognitive Computing of Audiovisual Information”, and the Director of the Cognitive Systems and Information Processing Committee of the Chinese Society of Artificial Intelligence. Director of the Systems and Professional Committee, international publication "IEEE Trans. on Fuzzy Systems", "IEEE Trans. on Systems, Man and Cybernetics: Systems" "Mechatronics" and "International Journal of Control, Automation, and Systems (IJCAS)" Or editor-in-chief of the field, editorial board of the international journals "Robotics and Autonumous Systems" and "International Journal of Computational Intelligence Systems", editorial board of the domestic journal "Chinese Science: F" and "Journal of Automation".

[New Wisdom 100 People's Association] Seven masters talk about the status quo and pain points of human-computer interaction and terminal intelligence


Is artificial intelligence the "third apple" that changes the world?

Distinguished guests, everyone! I am very grateful to Xinzhiyuan for giving me this opportunity to communicate. Today's topic is artificial intelligence and robots in the cognitive era. Everyone defined 2015 as the first year of robots. Later, we saw that the first year of artificial intelligence was 2016, and IBM mentioned that 2016 is the beginning of the cognitive era.

What are the five most significant technologies affecting human society in the next five years? 2016 is visual, tactile, smelly, gustatory and auditory. Tsinghua has been working on visual processing and hearing since six years ago. A few days ago, Huawei proposed the era of touch. The touch is very important, especially during the operation of the robot.

When shopping online, the photos of the item always have the best perspective. Things get their hands and find that the texture and other aspects are not very good, which requires tactile help. We need to speak visually, and more is a semantic understanding. As explained earlier, vision is the most important part. People are the brains of vision. In addition, there are hearing and taste. How does the mother hear the child's appeal in the child's voice? Children under the age of 1 still can't talk. How does his language mean how to be understood by his mother? There is also the sense of smell, the ability to smell the disease and so on.

In the past, people and household appliances were also a one-way relationship with the goods. It is not easy to use my own test. After adding intelligence, the intelligent machine is formed. It has cognitive ability and can interact with you. People understand the machine, and the machine must understand people.

Yesterday, I interviewed Gong Ke, the president of Tianjin TV Station. I commented that the education in the cognitive era is two-way. In the past, it was one-way. The Ministry of Education formulated an outline. How can students graduate without passing the exam? In the era of intelligence, how is the outline determined, and the big data analysis of the situation of hundreds of thousands or even millions of students can judge whether the outline is correct or not? Many things in the cognitive era have become intelligent, and now they are within reach. In the past, when you can't stop at the restaurant, you have to wait in the car. Now you don't have to, the car is placed there, and the car electronic system can automatically detect it. Where there is a parking space, parking automatically passes. Beauty sees someone else's very beautiful package, you can open the online search to find out, where is the package, quality. Of particular importance, I visited the Australian National University last year. They did the first artificial retinal experiment on the blind. The blind saw the dark shadow of the object through artificial retina technology. There are also security fields. Everyone who leaves home to the unit and enters the first and second class cameras in Beijing will be recorded. Beijing has already achieved the license plate recognition. Wherever your car is driving, the Skynet system should be Can be detected. In the past few years, we undertook a multi-camera tracking project for a Japanese company. The system was developed to track the company's employees and even record the trajectories of the buildings in the building during the year as an indicator of his performance.

What is important for war is that the platform is cognitive. For example, in the United States, the bee colony drones use very small drones. The assembly of small drones requires strong communication and recognition technology.

The next generation of new concept combat weapons made by the United States has a very strong cognitive ability. American artificial intelligence is mainly driven by big companies. In fact, the word "intelligence" was first proposed in China. The scorpion said that "there can be a combination, that is, the ability," the cognitive ability is inherent in human beings; "knowing that there is a unity, that is, wisdom" through social practice, generating wisdom Innovation is also the intrinsic instinct of human beings. People are united, talents are generated in social practice, and cognitive ability is used to transform society. This is intelligence.

The ideological basis of artificial intelligence is very important. How to judge the machine is smart, I will not say more. The second important thing is the material basis, one is the computer and the other is the network.

The recent launch of 5G has also laid a very important foundation for the next step of artificial intelligence development. People's memory, especially experience-based cloud learning, is not feasible without network, including high-speed communication technology between U.S. If you follow the computing power of a thousand dollars, then the computer surpasses humans in 2040. If you follow the biological products, the floating point computing power provided in each memory unit, the machine will soon surpass humans.

Is artificial intelligence the "third apple" that changes the world? Adam and Eve were the first apples to change the world. The apple on the head of Newton was the second apple to change the world. The apple on the Turing table was the third apple to change society. The remarkable feature of the future era is that people and machines coexist, machines have intelligence, cognitive ability, and can interact with you. Only in this era, the third Apple era, the relationship between people and machines is two-way, in the past one way.

The next stage of artificial intelligence development is neural mechanism-driven brain cognition

The next stage of artificial intelligence development is neural mechanism-driven brain cognition. The human is the visual brain, from the eye to the last V1 area, up to the V4 area.

Now deep learning is layer by layer. There is no reverse connection between layers. There is a reverse connection between the layers and the same layer. Using this mechanism to transform the deep learning network will bring unexpected things. The work related to our lab made better results on the four data sets.

Reinforce learning. Google acquired DeepMind, and later did AlphaGo, using a single evaluation of the principles of deep learning and reinforcement learning and a comprehensive rating of the valuation network.

In addition, brain science research is inseparable from instruments. The electron microscope used by Harvard University in the United States can make 30-nanometer slice imaging. When mice play games, they can make slices by scanning, see the discharge of neurons, and identify it. coding. These instruments are important for the development of brain science and even future artificial intelligence.

In the past two days, over-limit learning has been hot. When I was a doctoral thesis, it was generally believed that the neural network was multi-layered, and the hidden layer parameters of neurons were to be learned. In 2013 and 2015, anatomy found that these hidden layer parameters are inherent to humans and animals and do not need to be learned. Later, Professor Huang Guangbin and others on this basis, through the random generation method to set the hidden layer parameters, proposed the method of over-limit learning machine, that is, over-limit learning. In the past two years, this work has been combined with multi-core learning and deep learning.

The development of robots. In the past, robots were more about studying the bones of robots. Today's robots not only need to study bones, but also have sensors, muscles, and human brains. Such robots are called cognitive robots, not only need to study it. The relationship between kinematics and dynamics is also to study how sensory information is sensed, how multimodal information is characterized and fused, and how muscle movement produces complex operations.

The combination of human and machine, the life-like robot is an important concept. At the cellular level, the research of living body materials may be the nemesis of cancer in the future, and may conquer cancer in blood vessels in the future.

In April 2016, the launch of the robot companion caused a lot of problems.

Our research team is also doing brain-controlled robots, and robots controlled by the brain can move off-site.

This is the third generation skin state robot we have made. At this year's Singapore International Robotics and Automation Conference, we gave a special invitation to the conference. Our understanding of artificial skin is not to make a skin patch to the injury. It is like a human hand with a skin and a leather. The epidermis is electronic, measuring texture, slippery, and measuring the positive pressure of the dermis. Much work has also been done on visual tactile coding, including their integration.

The development of robots relies on the development of artificial intelligence. Artificial intelligence is inseparable from the development of life science and brain science. A closed loop has been formed between the three.

The robot is precisely the carrier of artificial intelligence, and it is a golden partner. The ability to think and think, the next generation of robots will be reflected, and the artificial intelligence is to promote it. In the past, there was a three principles of machine. When artificial intelligence developed to this day, there has been some fear. Last year, more than 100 scientists in the United States discussed the future development of artificial intelligence. One of them is very important. Will artificial intelligence hurt in the future? Humanity? Artificial intelligence must have an objective function that evolves along with the development of human society.

IBM has proposed three principles of artificial intelligence: first, to establish a mutual trust relationship with the artificial intelligence system, it must trust people; second, transparency, understand what constitutes the artificial intelligence system, what kind of parameters are used to learn; third, artificial The intelligent platform works with people in the industry. This is a very important aspect of the future.

In the process of artificial intelligence development, the most terrible thing is that robots generate self-awareness. The understanding of consciousness now has various viewpoints such as memory, quantum entanglement and perception package.

Artificial intelligence should be the soul of the robot. The robot is the machine + person. What does the person use? artificial intelligence. With the development of artificial intelligence, machines are also constantly evolving. People and robots are two systems. People are life systems, robots are artificial systems, and artificial systems and living systems are always learning from each other in the development process. Artificial systems are an important experimental platform. The two systems are constantly evolving and learning from each other. One day they will interact, and the place where the meeting may take place is when I am aware of the machine.

Weak artificial intelligence is based on big data and deep learning. The representative type is AlphaGo. The disadvantages are high energy consumption and high resources. It is highly specialized and has a single function. AlphaGo can only play Go and can't play chess. It is a matter of fact, Intelligence under specific conditions, not scalable.

Strong artificial intelligence, universal artificial intelligence with human thinking characteristics. When people do not fully information, they will reason and judge. When Chairman Mao crosses Chishui, there are not many observation tools and decision-making tools. Why was Chairman Mao so successful at Sidu Chishui? First of all, he used his behavior according to the characteristics of the commander commanded by the other party. Secondly, he used the telegram to obtain local information and successfully commanded Sidu Chishui.

Parrots, a deep learning framework independently developed by Shangtang Technology, has implemented 1207 layers on ImageNet's classification task and used 26 GPUs. Do you still need to make 3,000 layers and 200 GPUs? Certainly not. Deep learning also makes mistakes, and the panda is wrongly identified, but people do not have such problems.

The important question in the middle is structural information. How do you dig these things? People have the ability in this area, which needs to be developed in the brain science.

The artificial intelligence industry, the future algorithm industry and the chip industry are very important.

thank you all!

Huang Wei: A for microphone array, B for speech recognition, C for natural speech, and the last product is a joke.

Dr. Huang Wei: Dr. Zhongkeda, a postdoctoral fellow at Shanghai Jiaotong University, worked as a senior researcher at the Motorola China Research Center after graduation. He developed the world's first mobile phone voiceprint authentication system. Later, he became the core executive of Shanda Innovation Institute and created a voice branch. At the end of 2013, he joined the domestic artificial intelligence to lead the CEO of Yunzhisheng, responsible for the cloud awareness development strategy and operational management strategic planning. Since 1999, he has been involved in project research and has achieved product achievements in various fields such as medical, management information systems, natural sciences, voice, games, etc., for example, participated in the National Institute of Standards and Technology's speaker recognition evaluation (NIST SRE) from 2002 to 2004. The project won the first place in the SRE main task and won the highest "Golden Star Award" of the year. It is also the only Chinese who can make a keynote speaker for two consecutive years in the NIST evaluation. He was nominated for MIT TR35 in 2007 and won the top ten leading talents in science and technology in Shanghai in 2009.

[New Wisdom 100 People's Association] Seven masters talk about the status quo and pain points of human-computer interaction and terminal intelligence


Artificial intelligence this year is not the same as last year. Last year, whether it was the media or the market, everyone was more concerned about the PR level. It is often seen that companies of all sizes said they got NO.1 on a certain evaluation. This year, we will basically not talk about such a story. Perhaps more attention is paid to what user value and business value can be created by his technology. Today is not the same as the previous two artificial intelligences. At that time, it was limited by conditions and lacked in many aspects. Today's artificial intelligence has shown the ability to crush people in both auditory and visual, including medical and financial scenarios. Three years later, within the global scope, the artificial intelligence industry may reach the scale of 100 billion US dollars, and the Chinese market is the fastest growing.

What value should smart home control provide to users?

Smart home is a very important scene in the field of artificial intelligence. Smart homes cover a wide range of areas, and they are the same in the real estate sector. There are only 16 real estate projects related to intelligence. From 2002 to 2016, there were only 16 projects. Last year, nearly 60 smart related projects were involved.

When we talk about smart homes, we can't ignore products like echo. They are naturally intelligently controlled, and there is no doubt that they are very likely to be the entry point for user information in the home appliance environment. Different people may have different views on this trend. One point of view is that user habits do not exist. Few families in China like to listen to music. Another point of view is, why do I control home appliances through speakers? I can control it with my phone. But think about it, when you get home, open the mobile APP switch light is not stupid?

Today, the amount of this product is not big. My first job was in Motorola. I have witnessed how Motorola and Nokia have fallen from the giants in Motorola for six years. I also witnessed how Apple grew from a small company to a global market today. The company with the highest value.

When Apple introduced its first-generation mobile phone in 2007, it sold more than 1 million units worldwide. Last year, Echo sold more than 5 million units.

How did Apple subvert Motorola and Nokia? A very important point is multi-touch, a completely different form of interaction that subverts Motorola and Nokia from the bottom.

Do I need to listen to music through the speakers? The focus of the debate is not here. Isn't the speaker not important? It can only be said that maybe Amazon had music resources at the time, so I chose the speaker form to carry its cloud services, such as Alexa. Apple and Google use the APP to replace the URL. Alexa replaces the app with a skill. I believe this is a trend, and people here will see its implementation after three years.

Google has Google Home, Apple released HomePod this year, and there are follow-up companies in China. The success of the phone is beyond the previous button phone in the user experience. The success of the app is more than the previous URL. What value should intelligent central control provide to users? It has the ability to control the connection of smart devices at home. This is the most basic. When it can't be connected and can't be controlled, the other intelligence is completely in the air.


Google has Google Home, Apple released HomePod this year, and there are follow-up companies in China. The success of Apple's mobile phone is beyond the previous button phone in the user experience. The success of the app is more than the previous URL. What value should intelligent central control provide to users? It has the ability to control the connection of smart devices at home. This is the most basic. When it can't be connected and can't be controlled, the other intelligence is completely in the air.

Being able to become an assistant in the normal life of the family, providing the necessary basic services, and having some entertainment companionship functions, not only tooling, but also anthropomorphic, this is a few points that need to be possessed in the smart home control.


能够成为家里正常生活的助理,提供必要的基本的服务,有一些娱乐陪伴的功能,不光是工具化,还能拟人化,这是智能家居中控需要具备的几个要点。

智能音箱的技术挑战


智能音箱的技术挑战

用什么样的交互能力来传达这些价值?怎么能够让用户通过超越以前体验的方式来感受这种价值传递?我们并不是说去手机化,去APP 化,而是应该在手机APP 之外补充其它的交互能力。

这种设备应该不受空间的限制,它可能在离你有两米、三米、四米、五米甚至更远的地方,也能够像人一样随时待机唤醒,做到远场识别,用人和人之间的语言来交互。不光能听到,还能听懂,不光听懂,还能把你想的东西给到你,这是传递客户价值比较关键的点。

理想很丰满,现实很骨感,相关技术很难实现,很少能见到让用户满意的产品。我们拿到Echo 之后,发现它比较死板,现实和我们的想法还是有很大差距的。前段时间网上有一篇文章,叫《十步,智能音箱从入门到放弃》。

当我们意识到我们愿意尝试Alexa、度秘这种产品的时候,第一步OK,第二步、第三步发现太难做了,对很多公司来说这是不可能的事情,只好放弃。

今天很多人已经用了语音输入法,智能音箱不就是拿个音箱,接个SDK不就完了吗?不是的,这里面包含太多的技术环节,包括回声消除、降噪、语音唤醒、语音识别,包括云端识别,也包括低功耗的本地识别,根据用户喜欢、用户画像、知识图谱、推荐引擎包括整个对话逻辑以及最后用高表现力很自然的合成方式给用户反馈出来,这里面每个点都可以成就一篇非常伟大的博士论文。对公司来说,搞几个博士点恐怕不是那么简单。

技术一定要端到端打通,我们提出AI 集成化的概念。有很多的技术并不是孤立的,每个技术之间不是黑盒子,一定要深入打通才能得到最终比较好的体验。

业内有人提出,在移动互联网的今天,我们所说的

AI 产品经理,和以前的产品经理完全不一样,今天的人工智能产品经理一定要精通技术,知道每个技术的优点和缺点。不是光有算法就够了,还需要麦克风阵列技术等等,跟智能家居企业打通。正如之前我们都习惯了GUI,基于同一界面,GUI 怎么设计,怎么跟设备互动,逻辑怎么设计?包括对接大量的第三方资源,歌曲的、音乐的、天气的、股票的等等,每个环节都很难做,都很重要。智能音箱不是接一个讯飞或者语音SDK 就OK了。

今天有很多人会想,语音识别用一家的技术,麦克风阵列用一家的技术,其它技术再选用一家,串起来不就可以了吗?这个想法是不现实的。安静环境下和家里开着电视机的环境下,距离分别1米、3米、5米,科胜讯无论在安静还是噪音环境下,无论1米、3米、5米,指标都很稳定。怎么做到的?用不同厂商的麦克风阵列对接BAT 自己的识别引擎,科胜讯的“不好”非常稳定,只有百分之六十几,懂行的人都能够看出来。这说明科胜讯很稳定。但是科胜讯的技术不是为了识别做的,而是为了笔记本电脑上的通话质量做的,这种孤立的模块完全不行。国内某些公司自己做了麦克风阵列,去对接BAT 的识别引擎,效果一样差,甚至不如科胜讯。

我们再看一下讯飞做的或者云知声做的,我们很好地把麦克风阵列和AI 技术端到端打通,性能指标上碾压式地超越它们,所以说

AI 技术一定要芯片化。前不久国内厂商发布了音箱,还是传统互联网产品经理的思维,麦克风阵列用A,语音识别用B,自然语音理解用C,最后的产品就是一个笑话。最后的产品会让你崩溃。只要音箱一放音乐,2米之外要靠吼;放音乐的话,1米半以内基本唤不醒或者唤醒率最多5%、10%。把A、B、C 三个厂商的技术捏在一起,后果就是这样。

我们提出中控解决方案Pandora,希望解决现在讲的这些困境,把麦克风阵列技术、AI技术等所有的技术端到端打通,解决前面说的行业问题。我们集成了4MIC 阵列降噪,5米远场语音识别,继承了Echo、Google Home、HomePod 等音箱的特点,同时具备了很多它们没有的特点。除了智能化服务之外,还有一个很重要的技能——连接控制家里面所有设备时,我们对这些设备有一个最基本的要求就是速度。试想家里一个空调摇控器,按一下按纽,一两秒钟才反应,你不知道按了之后有没有反应,也许连着三下,最后也不知道开还是关了,按一下没反应,再按一下不知道那个状态是开还是关。一个机器人,也许它有非常强大的云端智能能力,但是反应特别迟钝,怎么办?它一定会让用户崩溃。我们提出一定要具备第一点,Pandora的所有系统通过云端提供认知和智能服务,同时支持终端的AI交互,以及在芯片终端感知和本地智能。

支撑Pandora 的技术,第一是快。多快?闪电一样快。我们Pandora 实现的技术能力唤醒时间小于0.3秒。云端响应速度小于1 秒。不光是说识别的反应速度,也包括一系列环节,包括云端识别,包括理解,包括知识图谱,包括服务召回,必须要真的从互联网产品经理的角度来打磨这个技术。

第二,准。要能非常准确地理解用户在说什么。实际上到今天为止,影响我们很多产品落地的一个很重要的原因,是很多技术指标只能局限于实验室环境里面,它可以拿标准的数据库跑到97%、98%的准确率,但在工业环境里面却一点价值都没有。

除了距离远之外,口音也是个大挑战。云知声今天凭借麦克风阵列和识别技术,能做到成为国内工业界唯一量产出货的厂商,没有之一,是唯一。无论跟国内哪一家厂商PK,我们唯一能做到,直接找带有口音的被测试人过来,不需要培训,不用教他怎么说,因为用户本身不知道怎么说;第二,他会直接把空调的风量开到最大。第三,直接说方言。产品开发的时候会遇到很多困难,产品想要量产的话,方言必须要解决。

工业量产还有一个很重要的指标是省。很多公司团队做一些PR 产品,性能还可以,上来搞一个4 核CPU,几个G 的内存,无法量产。云知声提出来一个观点——一定要省。最低主频低于100兆;第二,内存小于100K 字节。

还有一件重要的事情是稳。你在家里睡着了,音箱突然给你讲鬼故事,这样的产品是绝对不行的。要做到你叫它的时候它一定会答应你,不叫的时候绝对不答应。

用户也可以与我们的设备保持多轮对话,并在交互中随时打断,设备都可以灵活应对,实现如水般顺畅的流式交互。除了多轮对话,今年系统又放入了百科知识,机器人不仅是助理还是专家,对用户有更深入的理解和掌握。它会在使用过程中不断学习你了解你。我们的设备还有男声女声和童声,哪怕只有10分钟的数据都可以生成高表现力的声音。

通过中控方案,即使中控设备本身没有屏幕,也可以把家里所有屏幕都用起来,做到流式对话,让所有的用户行为习惯在各个设备之间无缝流动。我们的方案把所有合作伙伴的周期压缩到6个月以内,而且各个设备都可以使用。

丁衣:很多机器人产品把边界过度放大,这不会带来真正的销售和口碑变化

丁衣,前极路由、大街网的联合创始人。在品牌营销和渠道销售方面拥有丰富的经验和洞察。目前,负责物灵整体的市场销售和运营体系,致力于塑造一个世界级的灵性品牌,让物灵的产品走向世界各地。

物灵科技是一个新创立的人工智能科技公司,是由上市公司东方网力投资的,我们的定位非常清楚,就是做消费者品牌,做消费者产品,而产品主要是人工智能的机器人产品。


物灵科技是一个新创立的人工智能科技公司,是由上市公司东方网力投资的,我们的定位非常清楚,就是做消费者品牌,做消费者产品,而产品主要是人工智能的机器人产品。

我们非常重视产品定义,做好交互和体验,这个对我们来讲非常重要。现在所有的消费者智能类的产品,除了Echo,销量都不好。而我们要根据场景和需求来细化产品定义,做有实际价值的产品。现在市面上很多机器人产品把边界过度放大,对消费者来讲,提高了消费者的期望值,拿到以后,落差非常大,不会带来真正的销售和口碑变化。所以,我们认为如何定义产品本身和控制消费者预期很重要。

人目前都是通过机器来对接信息流和服务流的,这是通过人和设备之间的交互完成,我们的核心技术会专注在人机交互这件事情。

最终的智慧来自于人和机器的共生的能力,共同进化,我们希望交互的方式从键盘到鼠标GUI 时代,再到touch 时代,再到现在并没有完全定下来的BCI。大部分人现在会沉浸在touch 终端。所有注意力都在屏幕上,而我们要做的设备是静默式环绕式的设备,这将是一种无处不在的智能化和计算力。

对于服务于家庭的机器人来说,家庭里面的两类人群——成年人和未成年人——认知方式和语言体系完全不一样。

我们分成年人的产品和未成年人的产品。我们对产品的具体使用场景、具体功能产品的定义做了很强硬的限定,这样现在的AI 技术边界不会达不到。最近我们正在招募我们第一款产品的天使用户,是一款儿童阅读养成机器人Luka,用计算机视觉技术可以读市面上的绘本书,在京东上正在预约,大家可以去了解下。

另外,我们联合了国内的三家上市公司、一家基金还有三家AI 初创公司一起成立了万象人工智能研究院,希望把底层的算法和技术能够跟产业直接对应,研究员一开始研究的时候就知道谁来用,怎么用。我们的研究院是基金模式的,并且是全球化运营,紧密连接产业的。

我们物灵科技的新Office 在望京的浦项中心顶层,还配备了专业的咖啡馆,可以进行百人左右的发布会,风景非常好,希望做成人工智能消费级品牌的体验厅、展示厅和大家聚会的场所,欢迎下次新智元的百人会来我们新的Office 举办。

Panel:对现有智能音箱产品的分析,及对国内智能音箱市场的展望

杨静:今天我们Panel 的主题是《对现有智能音箱产品的分析,及对国内智能音箱市场的展望》。6 月初的苹果开发者大会上,智能音箱Apple Homepod 面世,成为亚马逊Echo 和谷歌Home 的劲敌。包括海尔在内的国内多家厂商也已经或者即将于近期推出自己的智能音箱,BAT也有意或已经入局。战局背后,是人机交互技术发展的驱动和市场对新一代人机交互界面的真实需求。更自然的人机交互方式是智能时代的重要特征之一。以语音识别、语义分析、视觉获取、上下文感知、VR 等等技术为内核的新一代人机交互界面,将成为智能家居、自动驾驶等终端智慧化应用场景下直接决定用户体验的关键模块。今天想请在座的各位专家聊一聊,智能音箱的技术挑战在哪里?中国的智能音箱中,有没有哪款能做到接近Echo 的水准?近期中国能否推出一种在市场上能够形成主导地位的、或者至少被消费者所广泛接受的音箱?我们首先有请张宝峰部长给我们分享一下。

张宝峰:家庭里面会存在一个智能入口,在未来的发展里,这是毫无疑问的。Facebook 最新开发了贾维斯,也是一样的效果,交互是不是以音箱的形态出现,不一定。非常重要的事情是约束场景。我看过Echo 的调研,有几个TOP 应用,比如听音乐、设闹钟,有些应用使用的比例非常低。 我们到底是做广做全,还是真正做出特定价值?这是非常值得思考的问题。

杨静:赵峰总有没有看好的产品?

赵峰博士,海尔家电产业集团副总裁兼CTO,曾担任微软亚洲研究院常务副院长,主要负责物联网、大数据、计算机系统及网络等领域的研发工作。赵峰博士毕业于麻省理工学院(MIT) 计算机系及人工智能实验室,曾在位于硅谷的Xerox PARC担任首席科学家,创立了该中心的传感器网络研究,并先后任教于美国俄亥俄州立大学和斯坦福大学。赵博士是美国电机电子工程师学会IEEE Fellow,撰写了物联网领域第一本专著《Wireless Sensor Networks》,被多所美国大学选为教科书。

赵峰:对智能音箱来说,实际上更重要的是背后的语音助手。大家接下来更需要关注的,是智能音箱背后整个语音服务生态体系,它的硬件展现方式可能是在音箱上,但我认为,家里任何一个智能硬件都可以当成入口,智能音箱这个概念需要泛化,而不是简单的音箱形态。


赵峰:对智能音箱来说,实际上更重要的是背后的语音助手。大家接下来更需要关注的,是智能音箱背后整个语音服务生态体系,它的硬件展现方式可能是在音箱上,但我认为,家里任何一个智能硬件都可以当成入口,智能音箱这个概念需要泛化,而不是简单的音箱形态。

现在大家对人工智能的期望远远高于技术能够实现和提供给用户的体验,这一点我非常担心。看了杨静总关于2016 年人工智能的调研报告,里面讲到人工智能三起三落,我希望这次第三次不要再落下去。前面人工智能三起二落,第一波是刚开始的符号运算专家系统,第二波是神经网络,那时候没有大规模运算和数据的支撑,大家的期望值和现实之间产生了落差。这次第三波不一样,基于深度学习,大数据和大规模计算,语音识别和图像识别在有些领域已经能够达到体验上的一个阈值,大家能够接受这样的体验。以前不能达到这个阈值,十句话里面有三句话计算机是识别错的,大家感觉就非常差。在泛化的人工智能中,特别是没有限制的对话当中,要能够非常流畅地像人一样自然交互,现在还不能做到。我们要聚焦在几个垂直领域,把体验做好。比如在家庭场景里,把用户体验真正做到极致,使对话能够流畅,不管是连续说,还是多轮对话,还是背后知识库的支持,都能够建全,用户能真正得到她需要的服务。如果现在想做一个类人机器人,追求什么都懂,知识面像人一样全面,现在还没到那个技术水平,而且还会把人工智能带入死胡同,我不希望看到那个第三个“落”出现。大家的期望是,在现阶段把智能音箱体验做好,领域更聚焦一点,满足用户的刚需。听音乐是刚需,智慧生活、智慧家庭里音箱作为用户交互人口,作为智慧家庭一个中控,和各种智能硬件互联互通,通过交互获取服务,也是一个刚需。但如果短期内期望值无限高的话,对整个业界的持续发展实际上是负面的。

我就说这两点,第一是智能音箱更重要的是背后的语音助手,可以在音箱上展现,也可以在冰箱或电视上展现;第二是现阶段需要提升用户体验。

杨静:虽然看起来有点慢,但是智能音箱的确都在进步。请孙富春教授给我们预言一下,哪个智能音箱您比较看好?

孙富春:认知时代离不开语音的交互,一谈语音交互就是智能音箱,有非常清晰的理解,将来的市场非常大。现在教育一个最大的变化就是,未来的书已经不是我们掌上的电子书,不是大家在计算机里面看到的电子书,未来电子书一定是多媒体,要激发大脑的各种感知皮层形成共享效应,这样在多媒体环境下,学习效率将大幅度提高。另一方面,好的乐曲如果在电视里面放出来已经失真了,如果能跟音乐完美结合,那是非常美好的一件事情。诗歌朗诵如果用手机去放,丢掉很多东西,如果有非常好的音箱展示,那感受会变得更加美好。借用一句话,人工智能使我们的未来更加美好,使我们的生活更加美好,音箱在此是不可或缺的。包括笔记本内的音箱,我希望它将来越来越逼真,越来越好,有立体感。分布式音箱不光是一面有,可能是立体的几面都有,美妙的声音能够给我们留下深刻印象,使我们一天生活充满阳光。

杨静:群友提出了一个切中要害的问题,现在各个巨头都推出了自己的生态平台,让创业公司不好站队。比如开发算法,一会儿在这个平台,一会儿在那个平台,浪费精力,而且这也涉及到上下游的硬件或者销售等等问题。现在大家都在疑惑,加入哪个更好?

孙富春:音箱是一个硬件,需要软件的支撑,经过软件处理的声音将更加美妙。也可能同共享单车一样,需要背后腾讯或者阿里巴巴的支持。国内人工智能的大公司就是BAT,国外就是谷歌、微软、Facebook,国内这些大公司在人工智能应用方面起了非常重要的作用。

这几个巨头应该是不相上下,下一步谁能给我们带来最美妙的享受我们就支持谁。

杨静:黄伟,你能不能更尖锐一点,更坚决一点,说一下哪个更好一点。你也是创业公司,如果将来必须要对平台做出选择的话,你觉得哪个更靠谱一点?

黄伟:这是创业公司很可能面临的选择,这个问题不是需要双向选择的。我觉得BAT都有很大的机会,我更看重谁有数据,技术的要素对最后的成功不是最重要的。相信BAT 有足够的资源吸引一流人才,包括我09年从传统的IT企业去互联网公司,余凯去百度更晚,在之前这些BAT互联网巨头里面找不出几个博士,基本是个位数,后来很多人说老黄当时怎么没去BAT?那时候没有BAT,那时候是SBAT,盛大最牛,我去的盛大。盛大全公司的博士当时不一定有五个人,那时候的互联网公司更强调运营。今天可以看到有很多一流的科学家都在互联网公司里面,人

Pet Bag

Pet Bag,Portable Carriers Bags,Soft Pet Backpack,Pet Bag Backpack

Yangzhou Pet's Products CO.,LTD , https://www.yzpqpets.com

Posted on