Somdeb Majumdar: Shaping the Future at Intel AI Labs

Dr. Somdeb Majumdar is the Director of AI Lab – a research organization within Intel Labs.

He received his Ph.D. from the University of California, Los Angeles and spent several years developing ultra-low-power communication systems, wearable medical devices and deep learning systems.

He has published at top-tier journals and conferences and holds 27 US patents. At Intel Labs, he leads a multi-disciplinary team developing foundational AI algorithms, scalable open-source software tools and disruptive applications in Computer Vision, Chip Design, Graph Learning, Scientific Computing and other emerging areas.

‍

Host:
Today on the RH podcast on AI, we have Dr. Somdeb Majumdar. He's the director of AI Labs, a research organization within Intel Labs.

Somdeb Majumdar:
My advice would be, if you're a high school student or in that sort of period, just focus on math, like algebra and matrices. It's funny, my high school teacher back in, I think, 10th grade, actually told me… I used to ask this question, "Why are we doing all this mathematics stuff? Where am I going to use it in my life?" and she said, you know, I still remember this true story. She said, "The one thing you should take away is matrices."

Host:
How’s it going? We’re started recording now, by the way.

Somdeb Majumdar:
Pretty good.

Host:
Yeah, I wanted to talk, there’s so much to talk about with the things that you do. I’m super, super interested. Um, it’s hard to know where to start, but I just would love for you to talk a little bit about Intel Labs and what you guys do. Just, you know, just go ahead.

Somdeb Majumdar:
Intel Labs is, I would say, quite a special place in the middle of a gigantic company. Intel, as you know, is a 100,000-plus employees, it’s a design and manufacturing company, one of the stalwarts of computer architecture, starting from the days of the 886 to all the way to today, where we have the Thor of heterogeneous hardware products. Labs is a very interesting place. It’s a few hundred researchers that are tasked with finding the next wave of technology. Obviously, AI is a big part of it. Many, many teams work on things that are not maybe traditionally AI, in the circuit design, software, and things like that. But there’s no denying that AI has pretty much permeated and made its presence felt in each and every division within Intel Labs. My group within Labs is AI-specific. I run a research lab called AI Lab, where we focus on algorithmic developments, looking beyond into the next two to five years to see what kind of new techniques and technologies need to be developed to address the upcoming needs of the community.

Host:
Okay, got it. So, when did you start with specifically AI Labs? When did you start running that?

Somdeb Majumdar:
I started, I think, this November. I’ll complete three years of running the lab. But I’ve been at Intel for about, I want to say, six or seven years now. So, I’ve sort of seen the transition between different divisions and things like that. But right now, it’s great to be in Intel because it’s truly a research organization, and you can all be aligned on the expectations from that.

Host:
What do you think was the most interesting thing you had to learn when running a lab? Because I would assume you had experience running things before. And when we're talking about running things for the sake of completing something on the business side, there's, I would say, quite a clear set of goals. But with a research lab, you know, how do you think about expectations, goal-setting, and leadership?

Somdeb Majumdar:
That’s a great question. I think the first lesson I learned is that being a researcher and running a research lab are two completely different things. I sort of went through the transition of being a researcher myself. I’ve been in corporate research for the entirety of my career, but for the vast majority of it, I’ve been, you know, an individual contributor where I’ve built things, where I’ve developed solutions and so forth, led a few small teams here and there. I think when I had to take over the lab, I had to take on a whole different set of dimensions—mainly people that I didn’t have to deal with. With that comes a lot of complexities in terms of expectations. I think researchers are a very decent breed. I think a lot of them come from a very open mindset, and they expect to see that kind of open mindset in the corporate arena where they’re working. I think the biggest challenge, and almost a challenge I enjoy, is sort of steering through that balance of where do you make the lab relevant for the company? Because at the end of the day, this is not a university. But at the same time, keeping things intellectually stimulating for the people who work in it.

Host:
So, where have you found that line? I mean, what’s your ethos on drawing that line?

Somdeb Majumdar:
So, my ethos is that I start from the point where I make the assumption that you can do a lot of types of research. Obviously, the type of research you would conduct at a university would be something that would take five to six years. Clearly, that would not stick in a corporate environment because no one's going to be supported for this long. So, what I've realized is the job of a research director or a lab director is to pick those problems and really sort of, you know, position the problem out and sort of scoop it out to maybe a year’s out, which is about as I think the right amount of time to give people the breath to innovate, but not get caught up in some of the product cycles. But I think the effectiveness of a lab leader is to really see where does this thing land in about a year and is there a path to transfer the technology to some team within the company or in the open-source ecosystem where maybe developers and other researchers find it helpful. So, that way we can pick those specific subset of problems that are still open-ended, that still don’t have solutions, that have a lot of questions or efficiency functionality associated with them. But again, it's back to the director’s job to really make that connection that, okay, you can work on this problem for a year, but I know where this is heading and how this will exit into something the company can use or, like I said, the developer community.

Host:

So let's talk a little bit about AI technology, which I think is the primary concern of your lab. A lot of things have happened recently, but LLMs, which is sort of a language model, is what ChatGPT is and how people tend to interact with AI is primarily through ChatGPT. So this very human language-friendly interface, that's been really impressive. So I’d love to get your initial thoughts on that technology and how you see it affecting the market or companies in general and us, and specifically also from a researcher’s point of view because you would have seen all the building blocks coming up in that pipeline. You would have seen that first, and so it'd be great to hear your two-three cents on it.

Somdeb Majumdar:

Yeah, sure. So my take on ChatGPT is probably very, very different from most, even users of the technology or people who have sort of the business use for it. I think ChatGPT is great, it got the whole world talking about AI, which means there’s new attention on the area. It means people have started talking about where regulation makes sense, where regulation doesn’t make sense, and questions about responsible AI have started coming up because obviously, once the initial euphoria of writing cool dad jokes went out of the way, we went through that phase of, you know, the honeymoon phase of ChatGPT, but now people are starting to use ChatGPT in actual enterprise applications, and people are starting to question the accuracy of the responses and things like that. So, when you look at it from that perspective, I think it’s great that it’s brought the attention back to some of the most important questions on bias or responsible AI, even around transparency, like how are these models trained, what data sets are they trained on? It’s unlocked the whole Pandora’s box of legal questions as you might have seen in terms of, you know, the discussion between authors and content creators that are starting to talk to the point that their work might have been used. So I think from that perspective, it’s fascinating to see this much public interest in the field.

Host:

Yeah, yeah, go ahead.

Somdeb Majumdar:

One of the interesting points that came up in my mind is with regards to authors, this concept of authenticity, authenticity of, I guess, creativity. I guess not necessarily thought in this case, because whether it’s cautious or not is more of a fun exercise than an exercise that’s fruitful. But as someone who has been playing with this for a while and understands it quite deeply, how valid is that concern that this is something that has been ingested from a human, it gets the voice of the author, but it isn’t the author? So, do you consider that a kind of plagiarism? You know, sort of like a metaphysical abstract plagiarism, plagiarism in a way, because it will get the voice right. Or it may get the voice. So it’s an interesting idea, but since you know the math, you know it a bit more deeply, what do you think? Sure, so my take on this, I'm not a policy expert by any means, but I think the way I would think about the question of plagiarism—not just in, you know, sort of this idea of providence—is to think about it the same way we think about these concepts when humans produce output. For example, humans are language models. We have models inside our head that have been developed. Of course, there's a genetic mechanism that we are born with; we're predisposed to understand certain concepts and things, which justifies the idea that it is a model. In fact, actually, it is. Yeah, I mean, it is a model. There's no question about it. Of course, we don't know the mechanics of it, but it is for sure a model. It's a black-box model, and then it gets fine-tuned, and you sort of learn based on various inputs. This brings us to the topic of automation of design tasks. As with the models in our minds, technology can help automate processes like design, but just as humans ingest and adapt ideas, automation must be carefully considered to avoid ethical pitfalls. So, what do we call things as humans? Like, you could argue that everything we write is a result of us ingesting ideas and, you know, specific ways of expressing ourselves in literature, for example, by our experiences of ingesting books and, you know, poems, music, and so on. But in the current world, we don't call that plagiarism. I think plagiarism is a very well-defined aspect, you know, you're basically regurgitating what you've seen somewhere else without attribution to the source. Similarly, the automation of design tasks must be evaluated in the context of originality, and we should apply the same standards to machines. I don’t think this is a technology question; I think it’s a policy question in terms of thinking of it as a black box, just like we think of outputs from a human as coming from a black box, and apply the same standards to it.

Host:

Fair enough. So, I have a question about your work. I'd love to hear some interesting things that you’ve come up with or stumbled upon in your research. Is there anything that really comes to mind?

Somdeb Majumdar:

I think one of the coolest things that come to mind—of course, this is super academic work—is back in the day, a few years ago, we were trying to build new types of learning algorithms. A new type of algorithm we were looking at was in the context of reinforcement learning, which is a specific type of training, you know, especially used in the field of robotics and things like that. And we were working on these benchmarks that involved a team of players—virtual players—trying to learn a game from scratch where they don't have any understanding of the rules, and they simply learn the rules of the game by taking random actions, getting a reward from the environment, and then figuring it out. It's very human-like in nature, the way we learn things. This is also sort of the technology behind AlphaGo and AlphaZero, which you may have heard of. Long story short, what blew my mind is some of the emerging behaviors, which almost spooked us, because some of the—you know, this was in the context of a soccer game, a virtual soccer game—and it was very interesting that the team that you were training, a team of virtual soccer players, eventually learned to coordinate with themselves, you know, take all kinds of moves that were maybe not beneficial to the individual player, but beneficial to the team. They even came up with team tactics, like, you know, the offside play, and sort of discovered these as they went. And I think that was sort of almost my moment where I was like, "Oh wow, this is... you know, the AI is doing things that I don't know how the heck it's doing this." Like, I know I wrote the—you know, the training mechanism, but it's discovering all these emergent properties and things like that. So, that’s one example among many where I’ve been sort of surprised by what these things can do.

Host:

So, what was surprising about that, in a way? Because maybe it's easier to say now than it used to be, but one thing we could probably say is like, a lot of these features or a lot of these things you can talk about when we talk about any kind of rule-based space is, if you search the space long enough, you'll be able to figure out every inch of that space. One of those inches would be the offside trap, for example, as long as you continue to turn the logic until it gets there, which is easy to say and hard to do. So, I understand that. But, what was really... what was surprising? Was it that it happened, or that you didn’t expect it to happen? Or you didn’t think it was possible that it could happen, based on your knowledge of the field?

Somdeb Majumdar:

Yeah, that's a great question. I think I would not have been surprised if it went down the road of figuring out these incredibly complex strategies. I mean, the offside trap is a pretty complex strategy that I've been talking about, but I would not have been surprised if this was truly written out in the traditional sort of rule-based AI, using traditional rule-based algorithms. In fact, I could go back and check where, during the evolution of that sort of intelligence system, these kinds of strategies began to emerge. I could relate it back to the specific rules of the game that I had encoded. I think what was surprising is that we did not do any of that explicitly. We essentially let, you know, have an abstract representation of the system, which is, you know, what are the actions you can take—pass, dribble, and things like that—and just let it run. What was more surprising is if you knew the technology under the hood, these are doing neural networks. Forget rule-based systems; these are matrices that are transforming data from one domain to what we call this abstract embedding domain, and then you're taking all kinds of decisions and learning about the environment, and about what is good, what is bad. But what's really happening under the hood are a bunch of matrix multiplications, and I did not think matrices had the power to, you know, get a system to behave intelligently and to the point where it did not need to have explicitly encoded rules, explicitly encoded, you know, sort of if-then statements to decide, you know, how to take a particular move. So, this was truly fascinating to see that, you know, deep learning really worked. I mean, this was obviously a deep reinforcement learning implementation.

Host:

And we also talked, you know, earlier about this project you worked on with proteins that you worked on in your lab. Could you expand on that a little bit?

Somdeb Majumdar:

Yeah, yeah. So, I think maybe to think about how we think about proteins and things like that, there's a very interesting connection back to GPT and sort of not just GPT, but language models in general. I think when, you know, researchers like ourselves look at a technology like GPT, we don't think necessarily of language, even though the largest use case of something like GPT is language. We really see, and in fact, the "T" in that GPT, which is a Transformer module, really—the goal and the function of the transformer model is to make correlations between different parts of a sequence. That's the way to think about it. So, a language or a sentence is made up of a sequence of words. A longer text is made up of a sequence of sentences, and it can go through the whole hierarchy of a particular textual sort of representation of logic, right? And what the transformer in the GPT is doing is really creating these internal models of what, um, any point in the sequence, like a word in the case of a text representation, how it correlates with all the other tokens or all the other words in this particular case in that piece of text. But you could sort of abstract this out and say it's not just about text or things like that. Anything that is sequential in nature, it could and has a function that depends on these sorts of correlations between different parts of the sequence, could benefit from this kind of, you know, sequence representation technology, like, you know, the Transformer.

So this is where, you know, applications in science, for example, protein design or drug discovery, for example, or discovering new types of molecules, really comes into the picture. Proteins are very interesting. All proteins are actually made up of 20 amino acids. So you can think of a protein structure as made up of, uh, usually, you know, anywhere from 100 to thousands of blocks or, let's call it, you know, tokens for now, because every one of those tokens, but each token is really one of 20 possible types. So that's it. So think of it as a dictionary that has 20 letters, and you're trying to create sort of this, you know, long text material that is maybe a thousand words long, but there's only 20 words to pick from. And so if you think from that perspective, it's very similar to a text-based task. In text, it has various kinds of function. Maybe it's trying to represent a particular emotion. It's trying to describe a certain situation. It has all kinds of, you know, functions associated with what the text represents. Proteins have a different type of function. Their function might be that a particular sequence of these tokens might represent very specific properties of the protein. So this might be a specific drug that needs to dock onto another molecule or sort of a target, you know, target protein. And, uh, this sequence of amino acids sort of determines that.

So the problem that we were working on and actually continue to work on is developing what we call language models for proteins, and so we call these protein language models and try to create these language representations of proteins, so you can do all sorts of things that you can generate new proteins, you can classify proteins, you can, you know, do all kinds of analytics on proteins and, again, why proteins? Because proteins are at the heart of, you know, drugs, they’re the heart of many biological, you know, sort of questions that, uh, exactly where, you know, life sciences, right? It’s very important and this analogy of 20 letters, uh, to the space of all proteins, it’s perfect. For the problem, the automation of design tasks in protein design is crucial to speeding up discovery. The automation of design tasks allows us to handle vast data and test new configurations efficiently, which is essential for advancing drug development and related biological research.

‍

Host:

Yeah, in fact, you would be surprised how many nails this GPT hammer has found because even, uh, I mean, there’s an interesting, uh, you know, I joke that in the early days, I mean, by the way, early days is 10 years ago in AI, um, a lot of the advances and research was driven by computer vision, like most of the solutions started with ImageNet, for example, where people were trying to classify cats versus dogs and things like that, and it’s quite interesting that—

Somdeb Majumdar:

Yeah, exactly.

Host:

That’s right, or, you know, the dress question, is it blue or is it gray? But then, over the years, what has happened, in fact, in the last couple of years, especially with the rise of Transformers and these kind of, you know, sort of building blocks of AI systems of today, uh, NLP or natural language, uh, processing or natural language understanding, um, has—the field of language has driven a lot of the, you know, underlying technology.

But we see parallels, again, back to computer vision, you can—there are these things that are called Vision Transformers, for example, which treat an image as a language, like it could actually take sequences of pixels, and you could say, you know, the pixels that represent this dog on the top left of this image has a relationship with the pixels that represent the person that’s holding onto the dog, and it can make all kinds of, you know, contextual understanding of an image, but based on this technology that came from that.

So I think that’s really the fascinating part of this speed, where you can sort of take solutions that came from a domain and apply it to something completely different.

Somdeb Majumdar:

Yeah, I think, uh, you know, it seems like the best way to sort of treat this is to, whatever your problem set is, um, take the space, turn it into some sort of encoding, uh, and make it as Godell, you know, make it as Goddelian as possible in a way, in terms of like, you know, you just have a bunch of symbols that you can shunt around with operators of some sort, and then it seems, which kind of makes sense because it is a computer, and then you have, um, you know, you have the AI system infer the relationships between the structures based on some feedback on results. And, you know, it seems—and then we have all these, we have all these, uh, tools to make that happen, and Transformers being one of them, or structures to make that happen, and Transformers being one of them.

Host:

I want to soften it a bit and talk a little bit about you and how you got into all this in the first place, how you got into AI. I know you really actually started off in a way in a space that really makes sense, which is, uh, signal processing. So could you talk about that and how you got in, how you were, you know, tricked into playing in this space?

Somdeb Majumdar:

I was supremely lucky to be at the right place at the right time and, uh, thankful with the right skill set to make the transition. Uh, yeah, even—it’s funny, I run ALB, but I had no formal sort of graduate-level training in my PhD. It involved signal processing, like you mentioned, uh, control theory, um, operator theory, which is a specific field of mathematics.

Um, and I spent a long time, uh, developing systems that, you know, where, you know—and these are based on, you know, sort of signal processing and things like that, or what we call codex, uh, in the area of low power systems. So we would build these, you know, you know, representation systems that went into everything from body-worn, like, you know, these headsets that we were— in fact, several years ago, we were building solutions for extremely low power, uh, headphones, which then went into a slightly different field, and we started developing prototypes of body-worn sensors for biomedical use cases and fitness use cases.

So I think that part of my training and career really grounded me into thinking of technology in very rigorous terms. Rigor is a very important part of these, uh, traditional, you know, fields like signal processing or control theory, uh, compared to today’s AI, like, things have to be almost close form in nature in terms of solutions that we, you know, we’re talking about. They have to be analytical, they have to be debuggable, so there’s a lot more rigor involved in a lot of these classical engineering fields. I’m really glad I went through that because, uh, right around, I want to say 2012 or 2013, it led to a point in my career where I understood the strengths and weaknesses of these traditional systems. The strength, obviously, like I said, is the close-form nature of the solutions. It's very analytical in nature and things like that. But this also was sort of its weakness because a lot of the cross-foreness of the solution, or the analytic ability to analyze the solutions, really came from the fact that humans designed the solutions. For example, humans designed filters in signal processing, or wavelets, for example, which were a big deal back in the days, but humans went and thought, "Oh, I need to, you know, let's say in image processing, I need to look at an image and figure out where the edges are. Like, I need to disambiguate it between the boundary of a person and the boundary of a cat."

Today, we can do it with deep learning, but back in the day, people spent PhDs designing these filters that would do that job for you. The idea that people are spending their PhDs deciding where the line is between a cat and a human is... I mean, they took that much time because computers don't see images the way humans do. So, it takes a depth of thought to treat encodings of pixels and figure out what you can do with those encodings of pixels to make it machine-like, the way humans interpret it. It was a big deal back then, but we also found that this was a weakness because there's only so much you can plan for when you're designing these systems, and that's why I got into AI. Because deep learning was taking off at that time.

I mean, the fact that I could build a deep learning system that could do classification on a thousand-class dataset, like images, which is a very important dataset in AI, where you're trying to classify between cats, dogs, and obviously much more than that—thousands of these different classes—without knowing anything about image representation blew my mind. Because now, a random person could come off the streets and... I think in 2013, the ImageNet paper significantly improved the state of the art in terms of classification accuracy on the standard dataset, and that blew my mind because these guys were not traditional computer vision scientists. They had not done their PhDs on how to design the filters, and suddenly they were beating all of the existing methods.

That's where I was like, "Okay, I got to shift, I got to understand how this whole deep learning technology works." And I think that’s why I changed my career trajectory for the better.

Host:

A theme that has emerged from this conversation is the idea of a specialized skill set or a specialized space of understanding interacting with a language model to some degree. It may not be an outright structure, but it’s something—when we talked about chip design, I’m assuming that interaction, because maybe it’s unpopular or because it works, but I’m assuming that interaction is also going to be most friendly through a language model where you’re communicating in human language, or something analogous to it. Because language models even write code. It’s really just, can you have something alphanumeric that makes sense? So, I’d love to hear what you think about that, right? What’s the underlying idea behind having a language model, a special space of expertise, and putting it all together to make something useful for people? Your example first was protein design, and the second one was more pertinent to Intel, which is chip design.

Somdeb Majumdar:

Yeah, so I think we tend to, I mean, researchers tend to take off the representation problem. I think in AI research, we tend to think of, okay, we have language, right? We have sentences, words... machines don't see words or sentences, so you have to have some kind of encoding. You referred to this earlier. So how do we build an encoding? The first question, number one, is: back in the day, you could just define the encoding. You could define a hash table, you could do all kinds of things with probabilistic encoding and things like that. But that wasn't very useful because you couldn’t derive relationships between differently encoded data. Right? So even language models went through this period of really figuring out how do you learn to generate new inquiries.

So the machine has to, you know, train the machine to come up with newer and newer embeddings, which are numerical in nature, but at the heart of it, you have to turn these non-numerical data structures into something that is a new variable encoding fundamentally, that is hopefully trainable, and then you can do interesting stuff with those encodings. In fact, in the AI world, we call those encodings embeddings. Like, you might hear this, I don't know, but you know the whole representation game is to see how we can, you know, train the machine to come up with better embeddings. So, it's the same thing when we think about proteins or we think about, you know, things like chip design, where we ask this question again, and it goes back to what's the right representation policy for the data. For something like chips or something like, you know, constructing logic out of logic gates, the data structure might be very different. For example, one of the data structures we work with a lot are graph data structures. You know, graphs are obviously ubiquitous. You find graphs in social networks. Facebook or Meta, for example, would have a graph that represents people and their relationships between them, and something like Amazon might have a graph that is a product recommendation graph, where you know there are relations, there are objects that people buy, there are people, and there are edges between or connections between these nodes—people, objects, etc.—that determine some kind of relationship between them. On a circuit board, you have something similar. You might have nodes and you can represent it as a graph, where the nodes would be the components of the circuit. This could be, for example, the memory block, the CPU block, etc. Or you can go inside one of these blocks and go all the way deep down to the logic gates, right? But they all are connected, like literally, with wires, and, you know, sort of the sequence of connections means something. It's a function, and it means a lot of things other than the function. It can determine power, it can determine whether you're meeting time, all of these specific technical KPIs for that system. So, we tend to think of these things as well. There's a way to represent it as maybe language data. In the case of the protein sequence, it makes sense because it's a sequential thing and it's very akin to like a sentence or a string of words. On something like solving problems for, let's say, product recommendation systems, like if I'm an Amazon and I'm trying to add new products into the sphere and I'm trying to make a decision on, uh, should I recommend to Chris to buy that, uh, I don't know, dog collar? So, that is a new decision that I have to make, and this connection doesn't exist because the connection between Chris and the dog collar—this particular object—doesn't exist because this dog collar is a new item. So, it's an interesting prediction task, where you're trying to predict on a new, you know, relationship. So, for a problem like that, or for a problem like saying, how do I arrange the components of my chip? I will go with a different representation. I might go with the, you know, graph representation, and then I have to, if I take the decision, I have to build out my entire stat of what we call graph neural networks, which is similar in spirit but very different in implementation that I need to build on top of these kinds of models. So, we constantly have to take these kinds of decisions in terms of what's the right way to represent the data. Once you've taken the decision and represented the data, what's the right sort of machine learning stack to put on top of it? Which is really asking the question in plain speak: what is the right machine learning stack to convert this raw data to something that is compact, that is, you know, that you can perform things like classification or prediction or any of these different tasks? So, that's literally the life in the day of a researcher. We take these decisions day in and day out. We look at new types of data, we come up with new types of representation structures, and then training methodologies, and rinse and repeat.

Host:

Got it. So, um, are there any, uh, algorithms or architectures that are really interesting to you right now, that are really working for you?

Somdeb Majumdar:

So, I think, uh, the big, uh, where technology or I would say, sort of, we call it primitives, like in terms of, you know, sort of the building blocks of this, is clearly the Transformer. Uh, a Transformer is basically the core block that drives GPT or any sort of language model these days, and, uh, I think researchers have figured out all kinds of ways to make this Transformer more efficient, uh, so that continues to be a core area of research. How do we make sure that you can aggregate context over longer and longer, um, you know, tokens? This is a big, you know, for example, what this really means on the ground level for, uh, you know, for users is that, for example, can I summarize a very long piece of text, like a very long book, for example? Because each word in that book is a token, and if I can aggregate context over very, very long .So that continues to remain sort of an interesting area for us. We're spending quite a bit of time thinking about other, you know, sort of Primitives that are more memory efficient and computer efficient. I mentioned graphs, in fact. But you would consider, you would consider graphs of Primitives, I thought. No, graphs are not a primitive. So I would consider graph as an alternative way of representing the data and building the Primitives that would work on top of the data structure.

Host:

Exactly, yeah. So that means that we have to come up with efficient ways to do what are called these convolution operations, which are very commonly done on, you know, sort of these computer vision inputs, for example. Uh, so we spend a lot of time thinking about that. We spend a lot of time thinking about how do we combine this idea of Transformers and graphs. Uh, because uh, graphs are fundamentally sparse in nature. Like the example I gave where things are connected to other things, there could be a lot of things, but not everything is necessarily related to everything. So this is a very powerful representation, uh, feature where I can actually now, you know, for example, there was a, you know, paper where, uh, where the authors argued that a Transformer is basically a...

Somdeb Majumdar:

That's what it sounds like to me. Yeah, because I, in my mind, I was thinking if you can superimpose the whole... like, I mean, there has to be an optimization there, but if you superimpose the whole, um, if you... if you superimpose the whole, uh, space onto some, a matrix where every location represents the existence of, uh, some section of the line, you can literally... you can literally draw like a one-to-one right on, and then it is a sparse space. But then you would have to reduce that because then that means the size of the matrix is the size of the space, which is ridiculous. But you know what I mean. But, but in many ways, that's... that's a, I mean, that's a big part of AI research. And when you look at how these models are designed, there's a lot of optimization going on under the hood. The more we can connect these types of structures, the better we can contribute to the advancement of AI research..

Host:

That's a very rudimentary perspective to be honest, because there's a lot of work.

Somdeb Majumdar:

No, that's actually, I mean, you pretty much hit the nail on the head. I mean, I think the... I mean the concrete way for maybe the non-researching folks would be to think of is, uh, take a sentence, you know, "The sun rises in the east," right? Uh, one way to represent this is to represent it as sort of a sequence of tokens: the sun rises in the east, uh, and try to sort of, uh, create these attention heads, which is what the Transformer does, to say, you know, or maybe a slightly complex sentence, maybe "The sun rises in the east and it sets in the west." So the word "it" is really related to the word "sun." And so what the Transformer really does is create these attention, you know, models that tell you that, you know, this thing, like the sun, is related to the word "it" in the sentence. If you can query, "Where does it set?" then, you know, the response could be, "It sets in the west," or, "The sun sets in the west," because the model now has an understanding that "it" and "sun" are really... now you can see where the graph idea comes right away, because he could also say, "Alternatively, in an alternative universe, he could say the sun rises in the east," are all nodes on a graph and they're connected with some kind of edges, and the edges might be representing sort of the correlation between them. And so in this particular scenario, the connection between "sun" and "it" would be a very strong connection, and there might be no connection between, you know, generic words like "the," because "the" is like... is not a very useful discriminator. But you can explicitly build these graphs that are very, very sparse. And what that means for the execution of these, you know, workloads on the hardware, on the GPU or the CPU, is that you're dealing with lower memory because there's fewer things to represent. You're dealing with fewer computations because there are fewer numbers of weights to update in the corresponding neural network that sits on top of this graph. This type of AI research is critical for improving the efficiency of neural networks and better understanding how language models work. As the field of AI research progresses, we can expect even more breakthroughs in how we can represent and process language with minimal computational resources.

Host:

So the graph neural network is correspondingly benefiting from the sparse nature of the graph that's lying under you. So we play around with a lot, and this is a very active area in our, in our research lab right now, really going deep down into the representation space and seeing, is sort of this token-based Transformer-centric representation the right way to do it for, uh, where is the world going towards in the next two to five years? And I strongly believe people will definitely come up with, with, uh, with more efficient representation, and GPT will transform into something else. The...

Somdeb Majumdar:

Yeah, it might become something else.

Host:

So how do we think like, so how is... uh, if you wanted to think about this from a chip design perspective, and I think it's a very interesting perspective because, um, well, most people are thinking primarily on the software level and sort of, you know, sort of at that tier, but on the chip level, uh, you gave a great analogy. But how do you..You think about that, okay now we have this sort of Spar space, and this example is sort of graph-based as a fundamental representation. How does the chip take advantage of that apart from?

Somdeb Majumdar:

The fact that it's sparse, because you can make this argument that if it's sparse, it's sparse. So whatever abstraction representer that you have will take advantage of it. It will be able to take advantage of that because it is just less data and you have some relationship-based thing that is metadata on your basic structure or something, right? So how does the chip itself, why does that excite you that the chip itself can take advantage of that? Is it because of the nature of the chips, or is it just in general that any sort of win on the chip level cascades to orders of magnitude up because it is the chip? Does that make sense?

Host:

I wish I could ask that better.

Somdeb Majumdar:

No, I think I get the gist of the question. I'll try to explain it in a slightly more understandable way. So the first thing to understand is that the chip doesn't know it's sparse, you know, or what sparsity is or things like that. So we're trying to fundamentally solve the designer's problem where the designer is looking at this bag of logic gates and sort of CPU components and memory components and power components and trying to basically arrange them literally on a rectangle to say, "What is the smallest rectangle inside which I can pack these millions of things?"

Now, you can look at it at different hierarchies. You can look at it at tens of millions of logic gates, or you can look at it at the hierarchy of macro functions that are comprised of these gates, or you could look at them at the level of chip-level blocks. So for example, the memory block or the CPU block. But it doesn't matter what hierarchy you look at. The job of a designer, someone who's sitting at Intel or other semiconductor design companies, is really to ask the question, "How the heck do I pack all these things into the smallest space possible?" And people would give their right arm for a 5% improvement in the area or the power profile of the design that they build, just because of the scale of the product. Millions and millions of chips are sold, and if you can get the size down, that means the device inside which it's sitting becomes smaller. If it's a cell phone, if it's a laptop, if it's your car, power consumption is a big deal. So people in this industry spend a lot of time thinking about even incremental improvements to power, performance, area—all of these things.

So we come into this picture. Even though I work at Intel, I'm not a computer architect, I'm not a design engineer. I look at it from the perspective of, "Oh, this looks to me like a bin-packing problem." And a bin-packing problem is a very classic AI research combinatorial optimization problem where you're trying to ask the question, "What is the smallest bin inside which I can pack a number of different labeled blocks?" So it's very similar—it's actually a geometric question.

Now, back to your question about graphs and sparsity and things like that. The reason why it matters is that when you're dealing with tens of millions of logic gates, it does not make sense to start with a representation where I need to compute attention, for example, using a Transformer-centric language over all 10 million tokens. That would not work very well, and I know that's not even necessary because different parts of the chip are sort of isolated. So you can solve this problem in a high fashion where there are sparse connections between blocks, but there are dense connections within blocks. So I can sort of have this representation which has different densities of connections depending on spatially where you're looking at that layer.

So it's very naturally falls into a graph representation, and then you can build all sorts of machine learning stacks on top. This is an area where AI research is actively exploring new methods to improve the efficiency and scalability of these models in hardware environments.

Host:

That is actually really, you know, the way you explained it was very good. And it’s just very powerful. I mean, just the idea of having to reduce power in terms of the requirements of power. Right now, all machine learning has to happen on very large systems, and anything that has any utility is happening on the largest possible system right now. So there's definitely a need for reduction in the amount of power that this technology uses in order for it to continue to climb down that... what would you call it? Sort of, I guess...The power gradient, I forget the name of the guy who every, every 18 months the power of chips increases.

Somdeb Majumdar:

That was us. That's Moore's law, that was one of our founders. So I, I mean, I don't know the guy but yeah, no I think I asked the right person. I think I had the right person there. So, you know, there's a lot in that direction. One thing I did want to talk about before we finish this off is this idea of proprietary information and open source information and, you know, sort of how that is. I guess when we, when we have large organizations that are trying to have, or trying to use this and this sort of goes back to this idea of when we talked about earlier, is the voice of the person from an abstract point of view something that can be considered something that the person owns and owns in perpetuity, or is any derivative of it valid? Because I'm sort of, is it an emerging property of these set of words, even if this person has a tendency towards them, is the voice that they speak in just an emerging property of having all these words? So, there's a lot of questions. You know, there's a lot of like fun abstract questions you can ask about that. But when we come to company decision-making, policy decision-making, that becomes a bit of a complex thing, right? Or the concept of ownership is quite important. So, how do you, for someone who specifically generates new things, right, in a company setting and then has to draw that line themselves between what is and isn't company theirs in the sense that it's the company’s, or in the sense that it's owned by other people? So, how do you, and then also other companies that want to use this, right, and they want to make their own stuff or want to use, or have out, or whether they’re outsourcing AI pieces, a lot of companies, I’m sure, big companies want to use this technology, the faster they use it the better for the people who are making it, but they do have concerns over their information and what's theirs and what's others. So, I’d love to know what you think about this because you are, you are the, you're the initial point of that, where you're making the thing and then we decide, you know, how we use it, but you are making the thing.

Host:

So how do you think about that as you’re making this happen?

Somdeb Majumdar:

Yeah, I mean, there are just so many layers to that question. I think, like I said, a lot of us researchers who work in the corporate world wear two hats. One is the hat of the open ecosystem researcher, and this is the hat that believes in open source, believes in, you know, public publishing and being very transparent about publications all the way down to what data sets are used, what methodologies are used, how the comparison with different baselines is done, and even down to, you know, how compute intensive the submission is, like why is it going to be, you know, sort of sustainable in terms of the computing. And I, I got to say like, um, uh, I've seen at Intel very, very strong emphasis on these questions early on. In fact, there's a whole, you know, group of people that think about responsible AI very, very deeply. And even though we are very open-source friendly, I mean, I would say 80% plus of the things that we develop, we open source, uh, in my lab particularly, and many other MBS labs that work here, we’re very open-source friendly. But that also means that we are very aware of the responsibility that comes with releasing something to the public. Open source already takes away a lot of the friction because you can clearly go inside the code, but we also are very careful about what data is used to train our machine learning models. We make it very transparent in terms of the, you know, the compute requirements, the memory requirements and such. There will always be a niche set of things that don't make sense to open source because this might be, you know, very valuable company IP that is directly, you know, responsible for generating massive amounts of revenue and things like that, and we have to take that call, um, sort of project by project. But I would say the vast majority of the things that we do is open source. You also have to think, I mean, there's another way of enabling the open ecosystem, which may not be open source, but sort of the GPT model, where you still, you know, provide APIs to, you know, sort of do fine-tuning for corporate data and things like that. Again, there, things are a little bit of a slippery slope because, uh, you know, transparency is a very important thing and you're already seeing this. I mean, to the point of derivatives and things like that, uh, the question that you asked, um, people and that's how they're finding out that their data has been used. So I think for going forward, products like GPT or derivatives of GPT or whatever comes in the future really need to be open, at least about the data sources that they have used because this directly means whether you pay a royalty at that level to an author or a creator, right? So that is extremely important according to me, and something that I'm glad that sort of GPT brought that attention back to, you know, sort of data ownership. And there's, of course, model ownership, which is a whole different question—whether you make the model open source or you make the model closed source. But at the very least, I can understand, you know, sort of a lot of companies' desire to make the model closed source because you want to control revenue and things like that. But I feel there's a point where, you know, things like data provenance is non-negotiable, because then you open up the... from the company's perspective, you open up yourself to legal risk, and from the consumer's perspective, you don't know where on the plagiarism spectrum the content that you're consuming lies. You cannot even make a definitive decision about that if you didn't know what data was consumed to begin with. So I think these are very, very important policy questions that will, I'm sure, keep us busy for many, many years.

I asked this question for the people who just stumble upon, you know, sort of trying to figure out what they're going to do in their careers and see something like this and think, "Okay, you know what, if I wanted to be in the position of a researcher but in this very particular niche, but at Intel or a new company..." So I think I'll ask a very primary question, which is, you know, what do you think the best way to get started in this field is now? Where do you think people should be really focusing on as they complete, you know, their high school or trying maybe to get into their master's? You know, sort of trying to get to that next tier, accreditation, or interest.

That's a great question. And yeah, I think the answer depends on where, you know, in their career path or their academic path people are. I think if you're talking about, you know, students that are in maybe high school or even, sort of, I mean these days, I've seen plots from, you know, 8th and 9th graders writing about deep learning, so I'm not surprised anymore that they're thinking, but it's great that they're thinking. So I think if you're at that level, very, you're at high school level, I think... forget about the buzzwords. I think my biggest advice would be stop reading, you know, social media posts on AI research, stop reading all the books, you know, either the "Doom and Bloom" or the other end of the spectrum where this is the best thing since sliced bread. Tune that out. At the end of the day, the people who build the systems are fundamentally mathematicians, logicians, computer scientists. So I think the focus on the fundamental sciences and mathematics is what is going to drive this field forward. And this generation that are currently high schoolers or at that sort of high school, just getting graduated college, that is the perfect time to put your head down and just do the math, so to speak. I think if you're sort of an early stage career person who's just maybe graduating from maybe a four-year undergrad course or things like that, I would strongly suggest going into grad school, even if it's for a couple of years, and take a few courses and maybe even do a thesis on, you know, missing projects. I think it keeps the... if you enter the corporate world right away as a corporate researcher, there are a lot more expectations from the industry to start producing products, and I think that takes away from the focus on really, you know, developing the technical skills you need to solve this. So my advice would be, if you're a high school student or in that sort of period, just do math. Linear algebra, matrices... It's funny, my high school teacher back in, I think, 10th grade, actually told me... I used to ask this question, like, why are we doing all this mathematics stuff? Where am I going to use it in my life? And she said, you know, I still remember, there's a true story. She said, "The one thing you should take away is matrices," and I don't think she even knew what role matrices would play, and it literally pays my salary. So, I don't know what the equivalent of matrices for today's kids are, but I would say, you know, just do math. And then, if you've gone through that rigorous training of mathematics in high school or first-year undergrad, and then go into a deeper study into maybe a 2-year Master's program, you are really set to be a solid, respected sort of AI research professional who can actually deep dive and sort of disambiguate between all these flashy terms like GANs and general intelligence and, you know, take that out of your head. Just focus on the math because math's going to move the field forward.

Host:

Well, you know, if people wanted to move forward and find you, where would they be able to do so?

Somdeb Majumdar:

Oh, just, I'm not much of a social media person. I think I'm just on LinkedIn. So they can find me on LinkedIn, send me a message, be happy to chat with everyone and carry on a conversation.

Host:

Well, thank you so much for, you know, being on the program and being on the podcast and basically just, like, teaching us about, you know, what you do and all the research you're doing. I'm sure there's lots of interesting things that are going to come out of your lab, and I'm looking forward to every single one of those things. And, you know, just thank you for teaching us about how chips and AI can be improved and some of the protein stuff you're doing. I think those things are really, really cool. They're really, really interesting, and I'm looking forward to seeing them in the market as you're working towards that.

Somdeb Majumdar:

Absolutely, thank you, Chris. This was super fun for me.

Host:

Good, good. Thank you.

‍

Watch Here Listen Here