My name is Thomas Hsiao and I am a Statistics major studying at Rice University. I'm interested in applying machine learning to epidemiology and public health.

Week 1: Dr. Metsis told us that we would be working on a multimodal approach to recognize emotion, using biosignals like EEG, EOG, and ECG. Most of the currently reliable emotion recognition algorithms use audiovisual cues, but the reasons for pursuing a biosignal approach are understandable. They would be much less invasive, depend less on the environment in terms of noise and lighting, and could potentially identify the emotions we hide with our facial expressions. 

Week 2: After hearing a little bit about the other REU students' projects, I realize our project is pretty unique in terms of the stage of research where we start. Most groups already have all the data, but we're actually getting our hands wet (as wet as they can get in a computer science lab), and have to start from the very beginning and collect our own data with a strong knowledge of the hardware that we're using.

I've mostly been reading the literature on emotion recognition. The most interesting/challenging problem that continues to come up is the induction of emotion. How do you know if your subject is angry when you want them to be angry? 

Week 7: While looking through the past data files, I realized our last experiment (the activity recognition) had been exactly one month ago! It appears that we've been spending far too much time on analyzing the data with no new results, and I'm now feeling the pressure of a deadline that's quickly approaching. We tried to replicate a SSVEP experiment, but we failed to find the desired frequencies in our signal. In addition, the EEG signals we're collecting from the BioRadio are taking around 35 seconds creating a huge sinusoidal movement before settling down. These results aren't encouraging, but there are various other scenarios we have yet to try. Considering how little time we have left, it seems like we'll have to stick with the BioRadio for EEG data collection. The Emotiv requires a separate Research Edition SDK purchase to gain access to the raw EEG data, and no other EEG headset company has gotten back to us. 

Week 8: Finally got around to looking at the EEG and EOG data collected from last week. It's pretty clear that eye blink artifacts are present in EEG, verified by the large peaks from blinking present in the Vertical EOG signal that align with the peaks in Fp1 and Fp2. We're currently looking into a way to remove eye blink artifacts from EEG signals without having to collect EOG. This will allow us an extra differential channel to collect other biosignals. With the BioRadio, we're limited to four biopotential signals, which currently consist of Fp1, Fp2, ECG/EKG, and Vertical EOG. If we can eliminate the need for EOG, we can use it for a more informative biosignal for emotional recognition, such as GSR. 

Week 9: With the BioRadio constraints, we decided to settle on GSR, ECG, F4 (one EEG channel), and EOG for our emotion recognition experiment. We had a pretty good plan for this experiment, and managed to put all of the visual stimuli (music, video) into one continuous video to reduce waiting time, so everything moved by pretty smoothly. On a side note, Lee got us some Taco Bell while we were experimenting and I had my very first Crunchwrap. Don't know Taco Bell feels like such a guilty pleasure but man it was tasty. 

Week 10: A great experiment we could have done was to observe our emotion throughout the length of the program, and analyze/classify the data in the last couple weeks. Most experiments I've read about have relied on short sessions with multiple subjects, but very rarely do they compare emotion across a significant length of time. It's common knowledge that emotion changes day-to-day, so it would be interesting to see if our learning algorithm could still identify the correct emotion in a time-of-experiment independent manner. 

We had a pretty solid final presentation, and I finally understood what everyone was working on for the past 8 weeks! I think that because all the groups had some semblance of results this time, it gave a more complete picture of the overall project, and the methods and research question made a lot more sense, particularly the KNN Clustering group and large gene network analysis. 

Though our final presentation is over, we still have a couple more experiments to run. First, we're almost positive we're overfitting the emotion recognition data we collected last week. We're achieving around 90% precision, but our cross validation scheme is training data from the same subject as our testing set. We decided to go with leave one subject out cross validation, where we train on four of the five subjects, and test on the fifth. We could also just performing the experiment on a single subject, testing and training within the same subject's data, though since there are no repeated runs or experiments performed over time, I don't know if I would trust the validity of those results. There are other experiments to run as well, but that remains the priority before any further feature selection and refinement.