Analyst Problems: Should I Learn to Code?
By Gemma JoyceApr 24
Published August 23rd 2016
The blend of human and computer intelligence is happening right before us, shaping the society that we live in.
Huge computing power has been unleashed, but the role of the human is still incredibly important. At our Now You Know Conference in Chicago, we wanted to learn more, so we asked some very clever people.
In this article, you’ll read how data science, AI and machines are advancing – in the words of experts in the field.
In a former life Claude was an astrophysicist. This panel has got serious credibility. Let’s get going.
— angela berger (@abergs) May 11, 2016
NATE: I think that there’s a ridiculous amount of opportunity with increases in computing power and some of the new algorithms that are out there. I’m cautiously optimistic.
BECKY: We marketers are getting involved in rebranding terms that people have been used to for a very long time. Today the terms are different.
Neural networks are now being rebranded as deep learning. I feel like the language of it is super-important.
CLAUDE: I think when we’re looking at fields of practice where there’s hybridization, you end up with like these massive differences, and some people say ‘a bell curve’ and other people call that a ‘Gaussian.’
They’re the same thing. I think it’s getting extremely confusing for non-practitioners.
I’m from Canada, the first person in my family since emigrating from France to graduate from high school. I had to explain to my mom about how I was doing black hole research. My mom was very religious. When you really, really have to understand it for someone else, you have to come up with ways of explaining it that get it beyond a lingo.
Don’t be intimidated by the lingo. The best thing to do is just ask somebody who’s friendly and they’ll explain it to you.
BECKY: Yes. We hear about chatbots, but what does that really mean? Everyone’s like, ‘Oh, this is artificial intelligence,’ but there are the service-based chatbots where you give very, very specific requests and orders to it.
CLAUDE: Focusing on the data is so, so, so, so important. As we saw with the Microsoft chatbot, Twitter is one of the dirtiest, most horrible data sets you could possibly work with. I’m not joking here, that’s what I do for a living.
Neural networks are only as good as the data you train them on. Remember what happened with the Microsoft AI bot? #NYKCONF
— Sarah Tyson Alfano (@sarahmtyson) May 11, 2016
My software basically identifies robots – in online gaming we see around 75% robots. There are no humans.
Way back when we saw that 97% of Newt Gingrich’s followers were not human – and that was six years ago. In 2013 on Facebook and Twitter the amount of spam went up by a factor of 350%.
NATE: You need to start with good data. I just wanted to make a comment about the order of magnitude that we’re talking when we talk good data. We’re looking at tens, or hundreds, or hopefully billions of inputs.
It’s not like you could train a classifier very well to do a complex task on a thousand posts or a hundred thousand posts – if we’re getting down to the practicals of developing this stuff, it’s on the orders of months or years, right?
While on the one hand there’s this disconnect between our understanding or our impression of what we should be able to do and our understanding of what actually can be done, I think it’s really interesting, too.
It’s very playful. We’re in a really playful space where we can develop models and we can look for places where the data already exists.
So for example, in Brandwatch, someone told me recently that there are 500,000 rules that have been written by customers.
That’s getting there. That’s like a data set that you might be able to ask questions to about, for example, what makes good bullion? Could you train a system to recognize good quality bullion and tell the user if it’s not? Stuff like that is starting to get to the cusp of these kinds of questions that we’ve got to ask.
I think that the personal assistant analogy has been treated a bit unfairly maybe, like Clippy from Word? That poor thing!
But at the same time, these recommendation engines are essentially doing this. We don’t recognize it to be the same sort of attempt, because the results are so much better.
At Brandwatch at least we have a slew of not just social data but also human interaction with that social data, and that’s somewhere in there, there are features and patterns that computers may be able to find to make the next analyst somebody who can do something twice as fast, or five times as fast.
BECKY: I think an important piece of the discussion, is how do you connect the stuff that you’re reading in Wired – the fact that Google’s Tensorflow is released, and Amazon’s Alexa – to the everyday tasks that we have to do, whether it’s analytical nature or insight generation in nature?
I feel like the difference you’re going to start to see is the services like IBM Watson will become generalist. I was just talking about it with my business partner. He’s worked with Watson on the Watson brand, and he said it has a great ability to identify, but has a terrible ability to classify. That’s because it’s trained to do so many things, versus something like Clarify for image recognition
Clarify can actually identify an image’s abstract art – if you actually think about abstract art, it isn’t just blobs of paint. It has composition; it has style; it has color choices.
As human beings it takes us six years to develop our musculature in order to properly focus and do things.
So if you have all of that input over six years, it contextualizes what we’re expecting artificial intelligence to do.
CLAUDE: One of the things that we do is automatic clustering and signal detection.
We have a prime minister called Justin Trudeau. When he did the equivalent of the primaries, we analyzed the entire country in French and English. We’re talking half a million tweet data sets crunching in real time.
What’s really important is having a system. You cannot configure a search for the things you do not know.
For example, a while ago we analyzed the market of construction shoes – the boots workers wear for construction. When we looked at everything in the entire discussion, again, it was over 90% irrelevant data.
It was people selling stuff on Amazon. You get rid of that. When you look at the weak signals that are left, we found two new markets for a brand. They thought that people who are going to buy these boots were hardcore construction workers and farmers.
We found two other markets they weren’t even thinking of.
So how can they configure their brands, how can they calculate or even have an idea of where to sell their shoes when they don’t know?
One of the new markets was for use on all-terrain vehicles. Those guys love their boots. And the second – I’m not joking – was the BDSM community. People fetishized the boots.
We just said, take the same boot, take it out of Home Depot, put it into the fetish stores and you’ve got a brand new market there.
NATE: A while ago I tried to build a genetic algorithm and I bolted everything I could think of, all the different selection mechanisms on the side, and then give it a really simple problem.
Maybe it’s not a simple problem, because it failed. There’s this fun game called The Prisoner’s Dilemma. The traditional model is if you put two people who committed a crime in different rooms, can you get them to betray each other, and confess for a slightly better outcome themselves?
What you do is you try to get the computer to play it. The strategy that most computers will go to is the Nash Equilibrium – everyone betrays everyone else constantly. It’s just miserable.
So you try to get the machine to learn what’s known as the Prado Optimal Strategy, which is the strategy where everyone has the best possible outcome. But to get there, you need to both refuse to give up the other one, which is a very dangerous game to play, right?
You get the computer to play against itself over and over and over again. Here’s one thing that I love about machines sometimes, getting them to balance exploration with optimization. So okay, you find a good strategy. Do you explore and maybe find a worse strategy and a slightly worse outcome overall, or do you continue to just like hammer that good strategy?
But it just never learnt. These are task-based machines. They really, really, really want to do one thing really well and they’re trying super-hard. But with not a lot of help.
CLAUDE: Right now big data is in a massive trough of disillusionment. People went out, they bought more bandwidth, they bought Hadoop clusters, and they bought all kinds of stuff. They’re crunching all the data. Great, what do you do with it? Why isn’t it an ROI? Because there was nobody on the other side interpreting the data and making it fit into what humans understand.
The other thing that McKenzie said way back in 2013 is that by 2018 there’s going to be missing 1.5 million people in the US economy who know how to even work with data.
And this is, I think, the big value-add of companies like Brandwatch. They’re providing a layer where they’re going to take care of the Hadoop clusters.
They’re going to take care of the massive processing farms and everything else so that humans, who are really good at understanding humans, unlike computers, will be able to do the value-add.
The big value right now in big data is actually the fact that you no longer have to be a data scientist to take advantage of it. I think the age of the data scientists are we’re going to end up being the plumbers of the systems that you guys are going to use.
NATE: I feel like there is a gap in the expectation that a machine learning algorithm or some other advanced technique is going to give you answers to anything. I don’t think we’re there yet. But if you craft a good question, the machines really, really want to answer it for you and they might just be able to do it.
That’s the part where we get the little thrill – where you used to take ten hours of manual labor analysis takes two seconds. Because you asked the right question, the computer’s like, ‘I got this,’ which is super cool.
BECKY: Yesterday we were talking about how Viv, Apple’s new personal assistant, has been released. What they’ve done is they’ve taken one specific task that they do really well and then combined it with another very articulated task, which is the commerce piece of it, connecting it to third parties.
That could be like, ‘Oh, you asked about this song? Let me give you the ability to play it,’ which is why I am particularly obsessed with Amazon’s Alexa. Not only do we have to pick the right help shape, pick the right algorithm with which to tackle the problem, I think we also have to think about how do we start getting the data that’s really useful that isn’t just Twitter, Facebook, Instagram.
There are now data sets outside of social. One of the big topics this year is chatbots, as well as what we’re calling conversational UI now. We’re typing in a question that isn’t just, ‘Give me a list of search results,’ and the response back is very, very specific.
So as a data scientist, how do we train our marketing executives to ask the appropriate question or the right question? What is that format that will give the machines the ability to answer for me?
NATE: I guess to sort of answer it from a technical perspective I think we’re getting there, certainly in terms of computing power. But we may just need tools so that human beings can use their amazing internal neural networks to do the discovery and then start to ask specific questions of the data after that.
CLAUDE: Really good UI and UX should be able to frame those questions for you. It’s made to make the person ask better question.
I really think one of the reasons I love Brandwatch is that they’ve nailed the UI. So to answer your question, the perfect software will keep you from asking stupid questions.
A massive thanks to our panelists. If you’re interested in discussions like this one, come and join us at the Now You Know Europe Conference this October, in London. Find out more here.