Regular and non-invasive assessments of cardiovascular function are important in surveillance for cardiovascular catastrophes and treatment therapies of chronic diseases. Resting heart rate, one of the simplest cardiovascular parameters, has been identified as an independent risk factor (comparable with smoking, dyslipidemia or hypertension) for cardiovascular disease. Currently, the gold standard techniques for measurement of the cardiac pulse such as the electrocardiogram (ECG) require patients to wear adhesive gel patches or chest straps that can cause skin irritation and discomfort. Commercial pulse oximetry sensors that attach to the fingertips or earlobes are also inconvenient for patients and the spring-loaded clips can cause pain if worn over a long period.
The ability to monitor a patient’s physiological signals by a remote, non-contact means is a tantalizing prospect that would enhance the delivery of primary healthcare. For example, the idea of performing physiological measurements on the face was first postulated by Pavlidis and associates and later demonstrated through analysis of facial thermal videos. Although non-contact methods may not be able to provide details concerning cardiac electrical conduction that ECG offers, these methods can now enable long-term monitoring of other physiological signals such as heart rate or respiratory rate by acquiring them continuously in an unobtrusive and comfortable manner. Beyond that, such a technology would also minimize the amount of cabling and clutter associated with neonatal ICU monitoring, long-term epilepsy monitoring, burn or trauma patient monitoring, sleep studies, and other cases where a continuous measure of heart rate is important.
The use of photoplethysmography (PPG), a low cost and non-invasive means of sensing the cardiovascular pulse wave (also called the blood volume pulse) through variations in transmitted or reflected light, for non-contact physiological measurements has been investigated recently. This electro-optic technique can provide valuable information about the cardiovascular system such as heart rate, arterial blood oxygen saturation, blood pressure, cardiac output and autonomic function.
Typically, PPG has always been implemented using dedicated light sources (e.g. red and/or infrared wavelengths), but recent work has shown that pulse measurements can be acquired using digital camcorders/cameras with normal ambient light as the illumination source. However, all these previous efforts lacked rigorous physiological and mathematical models amenable to computation; they relied instead on manual segmentation and heuristic interpretation of raw images with minimal validation of performance characteristics. Furthermore, PPG is known to be susceptive to motion-induced signal corruption and overcoming motion artifacts presents one of the most challenging problems. In most cases, the noise falls within the same frequency band as the physiological signal of interest, thus rendering linear filtering with fixed cut-off frequencies ineffective. In order to develop a clinically useful technology, there is a need for ancillary functionality such as motion artifact reduction through efficient and robust image analysis. One technique for noise removal from physiological signals is blind source separation (BSS). BSS refers to the recovery of unobserved signals or “sources” from a set of observed mixtures with no prior information about mixing process. Typically, the observations are acquired from the output of a set of sensors, where each sensor receives a different combination of the source signals. There are several methods of BSS and in this paper, we will focus on BSS by Independent Component Analysis (ICA). ICA is a technique for uncovering the independent source signals from a set of observations that are composed of linear mixtures of the underlying sources.
The use of this new technique in biomedical signal analysis is rapidly expanding, e.g. in noise removal from electrocardiogram (ECG) and electroencephalogram (EEG) recordings, separation of fetal and maternal ECGs recorded simultaneously, as well as detection of event related regions of activity in functional magnetic resonance imaging (fMRI) experiments . ICA has also been applied to reduce motion artifacts in PPG measurements.
In this paper, we present a novel methodology for non-contact, automated, and motion-tolerant cardiac pulse measurements from video images based on blind source separation. Firstly, we describe our approach and apply it to compute heart rate measurements from video images of the human face recorded using a simple webcam.
Secondly, we demonstrate how this method can tolerate motion artifacts and validate the accuracy of this approach with an FDA-approved finger blood volume pulse (BVP) measurement device. Thirdly, we show how this method can be easily extended for simultaneous heart rate measurements of multiple persons.
STUDY DESCRIPTION AND EXPERIMENTAL SETUP
2.1 Experimental setup:
We used a basic webcam embedded in a laptop (built-in iSight camera on a Macbook Pro by Apple Inc.) to record the videos for analysis. All videos were recorded in color (24-bit RGB with 3 channels × 8 bits/channel) at 15 frames per second (fps) with pixel resolution of 640 × 480 and saved in AVI format on the laptop.
12 participants (10 males, 2 females) between the ages of 18-31 years were enrolled for this study that was approved by the Massachusetts Institute of Technology Committee On the Use of Humans as Experimental Subjects (COUHES). Our sample featured participants of both genders, different ages and with varying skin colors (Asians, Africans and Caucasians). Informed consent was obtained from all the participants prior to the start of each study session.
2.2 Study description
For all experiments, an FDA-approved and commercially available blood volume pulse (BVP) sensor (Flexcomp Infiniti by Thought Technologies Ltd.) was used to measure the participant’s BVP signal via a finger probe at 256 Hz for validation. The experiments were conducted indoors and with a varying amount of sunlight as the only source of illumination. Figure 1 show the experimental setup. Participants were seated at a table in front of a laptop at a distance of approximately 0.5 m from the built-in webcam. Two videos, each lasting one-minute, were recorded for all participants. During the first video recording, participants were asked to sit still and stare at the webcam.
For the second video recording, participants were asked to move naturally as if they were interacting with the laptop, but to avoid large or rapid motions and to keep the hand wearing the finger BVP sensor still. In addition, we recorded a single, one-minute video of three participants sitting together at rest.
2.3 Independent component analysis
In this study, the underlying source signal of interest is the cardiovascular pulse wave that propagates throughout the body. Volumetric changes in the facial blood vessels during the cardiac cycle modify the path length of the incident ambient light such that the subsequent changes in amount of reflected light indicate the timing of cardiovascular events. By recording a video of the facial region with a webcam, the RGB color sensors pick up a mixture of the reflected plethysmographic signal along with other sources of fluctuations in light due to artifacts such as motion and changes in ambient lighting conditions. Given that hemoglobin absorptivity differs across the visible and near-infrared spectral range, each color sensor records a mixture of the original source signals with slightly different weights.
These observed signals from the red, green and blue color sensors are denoted by x1 (t), x2 (t ) and x3 (t ) respectively, which are amplitudes of the recorded signals (averages of all pixels in the facial region) at time point t.
In conventional ICA the number of recoverable sources cannot exceed the number of observations, thus we assumed three underlying source signals, represented by s1 (t), s2 (t) and s3 (t).
The ICA model assumes that the observed 3 signals are linear mixtures of the sources, i.e. xi (t ) = ∑ j =1 aij s j (t ) for each i = 1, 2, 3 . This can be represented compactly by the mixing equation
x (t) = As (t) …………………………………..(1)
where the column vectors x(t) = [ x1 (t ), x2 (t ), x3 (t )]T , s(t ) = [s1 (t ), s2 (t ), s3(t)]T and the square 3 × 3 matrix A contains the mixture coefficients aij. The aim of ICA is to find a separating or demixing matrix W that is an approximation of the inverse of the original mixing matrix A whose output
sˆ (t) = Wx(t)……………………………………(2)
is an estimate of the vector s(t ) containing the underlying source signals. According to the central limit theorem, a sum of independent random variables is more Gaussian than the original variables. Thus, to uncover the independent sources, W must maximize the non-Gaussianity of each source. In practice, iterative methods are used to maximize or minimize a given cost function that measures non-Gaussianity such as kurtosis, negentropy or mutual information.
2.4 Pulse measurement methodology
Post processing and analysis of both the video and physiological recordings were done using custom software written in MATLAB (The MathWorks, Inc.). An overview of the general steps in our approach to recovering the blood volume pulse is illustrated in Fig. 2.3 First, an automated face tracker was used to detect faces within the video frames and localize the measurement region of interest (ROI) for each video frame [Fig. 2.3(a)].
We utilized a free MATLAB-compatible version of the Open Computer Vision (OpenCV) library to obtain the coordinates of the face location. The OpenCV face detection algorithm is based on work by Viola and Jones, as well as Lienhart and Maydt. A cascade of boosted classifier uses 14 Haar-like digital image features trained with positive and negative examples. The pre-trained frontal face classifier available with OpenCV 2.0 was used.
The cascade nature uses a set of simple classifiers that are applied to each area of interest sequentially. At each stage, a classifier is built using a weighted vote, known as boosting. Either all stages are passed, meaning the region is likely to contain a face, or the area is rejected. The dimensions of the area of interest are changed sequentially in order to identify positive matches of different sizes. For each face detected, the algorithm returns the x- and y-coordinates along with the height and width that define a box around the face. From this output, we selected the center 60% width and full height of the box as the ROI for our subsequent calculations. To prevent face segmentation errors from affecting the performance of our algorithm, the face coordinates from the previous frame were used if no faces were detected. If multiple faces were detected when only one was expected, then our algorithm selected the face coordinates that were the closest to the coordinates from the previous frame.
The ROI was then separated into the three RGB channels [Fig. 2(b)] and spatially averaged over all pixels in the ROI to yield a red, blue and green measurement point for each frame and form the raw traces x1 (t) , x2 (t ) and x3 (t ) respectively [Fig. 2.3(c)]. Subsequent processing was performed using a 30 s moving window with 96.7% overlap (1 s increment). We normalized the raw RGB traces as follows:
For each i =1, 2, 3 where i μ and i σ are the mean and standard deviation of ( ) i x t respectively. The normalization transforms xi (t) to xi′(t) which is zero-mean and has unit variance. The normalized raw traces are then decomposed into three independent source signals Using ICA [Fig. 2.3(d)]. In this report, we used the joint approximate diagonalization of eigenmatrices (JADE) algorithm developed by Cardoso. This approach by tensorial methods uses fourth-order cumulant tensors and involves the joint diagonalization of cumulant matrices; the solution of this approximates statistical independence of the sources (to the fourth order).
Although there is no ordering of the ICA components, the second component typically contained a strong plethysmographic signal. For the sake of simplicity and automation, we always selected the second component as the desired source signal.
- The region of interest (ROI) is automatically detected using a face tracker.
- The ROI is decomposed into the RGB channels and spatially averaged to obtain
- The raw RGB traces. ICA is applied on the normalized RGB traces to recover
- three independent source signals.
Finally, we applied the fast Fourier transform (FFT) on the selected source signal to obtain the power spectrum. The pulse frequency was designated as the frequency that corresponded to the highest power of the spectrum within an operational frequency band. For our experiments, we set the operational range to [0.75, 4] Hz (corresponding to [45, 240] bpm) to provide a wide range of heart rate measurements. Similarly, we obtained the reference heart rate measurements from the recorded finger BVP signal using the same steps.
Despite the application of ICA in our proposed methodology, the pulse frequency computation may occasionally be affected by noise. To address this issue, we utilize the historical estimations of the pulse frequency to reject artifacts by fixing a threshold for maximum change in pulse rate between successive measurements (taken 1 s apart). If the difference between the current pulse rate estimation and the last computed value exceeded the threshold (we used a threshold of 12 bpm in our experiments), the algorithm rejected it and searched the operational frequency range for the frequency corresponding to the next highest power that met this constraint. If no frequency peaks that met the criteria were located, then the algorithm retained the current pulse frequency estimation.
Bland Altman plots were used for combined graphical and statistical interpretation of the two measurement techniques. The differences between estimates from ICA and the Flexcomp finger BVP sensor were plotted against the averages of both systems. The mean and standard deviation (SD) of the differences mean of the absolute differences and 95% limits of agreement ( ± 1.96 SD) were calculated. The root mean squared error (RMSE), Pearson’s correlation coefficients and the corresponding p-values were calculated for the estimated heart rate from ICA and the finger BVP. In addition, we calculated the false positive rate as the total number of segmentations yielding more than one face over the total number of frames segmented in the single-participant experiments. The false negative rate was computed as the total number of segmentations failing to return a face over the total number of frames segmented (all frames contained one face).
(a) 30 s raw RGB traces and
(b) their respective power spectra.
(c) The independent components recovered using ICA along with the reference finger BVP signal and
(d) their Respective power spectra.
(e) (Media 1) A single-frame excerpt from the webcam video recording with localized ROI (white box
(f) Evolution of the localized ROI over1 min.
- Low cost compares to other equipments.
- This project illustrates an innovative approach to pervasive health monitoring based on state-of-the-art technology.
- The Medical Mirror fits seamlessly into the ambient home environment, blending the data collection process into the course of daily routines.
- It is intended to provide a convenient way for people to track their daily health when they use the mirror for shaving, brushing teeth, etc.
CONCLUSION AND FUTRE SCOPE
This concept describes a novel methodology for recovering the cardiac pulse rate from video recordings of the human face and implementation using a simple webcam with ambient daylight providing illumination. This is the first demonstration of a low-cost method for non-contact heart rate measurements that is automated and motion-tolerant. Moreover, this approach is easily scalable for simultaneous assessment of multiple people in front of a camera. Given the low cost and widespread availability of webcams, this technology is promising for extending and improving access to medical care.
Although this concept only addressed the recovery of the cardiac pulse rate, many other important physiological parameters such as respiratory rate, heart rate variability and arterial blood oxygen saturation can potentially be estimated using the proposed technique. Creating a real-time, multi parameter physiological measurement platform based on this technology will be the subject of future work.