Objective: Identify comorbidities enriched in patients with the diagnosis of insomnia, by interrogating an electronic medical records (EMR) database.
Materials and Methods: We studied individuals with diagnosis codes or medication prescriptions for insomnia in a population of 314,292 patients from two urban tertiary-care hospitals between 1992 and 2010. We extracted structured EMR variables (e.g., demographics, billing codes, and medications) and unstructured variables related to insomnia or comorbidities from narrative notes. We developed a case-control methodology to match insomnia patients to non-insomnia controls, and calculated the enrichment of comorbidities specifically in insomnia patients. For each case with insomnia diagnosis codes, we applied 1:1 matching to identify a control (non-insomnia) patient with no insomnia diagnosis codes. Controls were matched for gender and age because all patients in our database had properly documented genders and dates of birth. Additional matching criteria included the total number of facts in the EMR associated with each case or control (including the number of laboratory measurements, prescriptions, diagnosis / procedure codes, and notes). Our rationale was that patients with similar numbers of medical facts are likely to utilize health-care resources equivalently. In addition to the absence of insomnia diagnosis codes, we achieved further confirmation by selecting controls only if no sleep medications were found in the patients’ medical profiles. Only patients 18 years of age or older were eligible to be selected as cases or controls. We calculated the case-to-control enrichment (i.e., insomnia-to-non-insomnia ratio) by dividing the ascertained values for insomnia cases by the corresponding values for non-insomnia controls in the 12 months prior to the first diagnosis code or medication prescription for insomnia. The case-to-control ratio was calculated for each comorbidity and reflected the enrichment level of the comorbidity in insomnia relative to non-insomnia controls. We compared categorical variables using the chi-square test. Differences in means of continuous variables were compared using the t-test or Wilcoxon rank sum test, as appropriate. All statistical tests were 2-sided tests with Bonferroni correction for multiple comparisons of 59 covariates. We also assessed clinical variables associated with insomnia by penalized logistic regression and we used the bootstrap procedure to calculate confidence intervals. We further separately examined enrichment of comorbidities in insomnia patients in the inpatient versus outpatient setting, to better understand the potentially different practice patterns.
Results: In patients with insomnia-related diagnosis codes or medications, concepts related to insomnia were highly enriched in narrative notes. We find highly significant enrichment of several comorbidities in insomnia patients, including all 10 of the conditions that contribute to patients with “multiple chronic conditions”. The top-ranked comorbidities by logistic regression were also highly ranked in our enrichment analysis. Narrative mentions of insomnia-related concepts were enriched in notes from outpatient but not inpatient encounters.
Conclusion: Our results highlight the importance of analyzing narrative notes to understand the scope of conditions such as insomnia that are challenging to study using structured variables alone. By systematically identifying common comorbidities that co-exist with insomnia, this report can clarify the medical impact of insomnia.
Among adults with nonalcoholic fatty liver disease (NAFLD), 25% of deaths are attributable to cardiovascular disease (CVD). CVD risk reduction in NAFLD requires not only modification of traditional CVD risk factors but identification of risk factors unique to NAFLD.
In a NAFLD cohort, we sought to identify non-traditional risk factors associated with CVD. NAFLD was determined by a previously described algorithm and a multivariable logistic regression model determined predictors of CVD.
Of the 8,409 individuals with NAFLD, 3,243 had CVD and 5,166 did not. On multivariable analysis, CVD among NAFLD patients was associated with traditional CVD risk factors including family history of CVD (OR 4.25, P=0.0007), hypertension (OR 2.54, P=0.0017), renal failure (OR 1.59, P=0.04), and age (OR 1.05, P<0.0001). Several non-traditional CVD risk factors including albumin, sodium, and Model for End-Stage Liver Disease (MELD) score were associated with CVD. On multivariable analysis, an increased MELD score (OR 1.10, P<0.0001) was associated with an increased risk of CVD. Albumin (OR 0.52, P<0.0001) and sodium (OR 0.96, P=0.037) were inversely associated with CVD. In addition, CVD was more common among those with a NAFLD fibrosis score >0.676 than those with a score ≤0.676 (39 vs. 20%, P<0.0001).
CVD in NAFLD is associated with traditional CVD risk factors, as well as higher MELD scores and lower albumin and sodium levels. Individuals with evidence of advanced fibrosis were more likely to have CVD. These findings suggest that the drivers of NAFLD may also promote CVD development and progression.
Summary: Hospitalizations among individuals with cirrhosis are frequent. Accurate assessment of the risk of mortality following cirrhosis-related admissions can enable clinicians to identify high-risk patients and modify treatment plans to decrease mortality risk.
Methods: 314,292 patients who received care at two urban tertiary care hospitals between 1992 and 2010 were included. Individuals with cirrhosis were identified using a combination of billing codes and mentions of “cirrhosis” in discharge summaries. We developed a prediction model for 90-day mortality considering patients who survived a cirrhosis-related admission. We extracted 113 Electronic Medical Record (EMR) structured and unstructured variables including demographics, laboratory values, billing codes, medications, and liver-related concepts from clinical narrative notes. We calculated areas under the receiver operating characteristic curves (AUROCs) to measure model accuracy in derivation and validation sets. To select the most informative variables, we used logistic regression with the adaptive least absolute shrinkage and selection operator (LASSO).
Results: We identified 4,781 cirrhosis-related admissions in which all patients survived the admission. 778 of the admissions resulted in death within 90 days after discharge (16.2%). Twenty seven variables were predictors of 90-day mortality (Figure 1). These included the Model for End-Stage Liver Disease (MELD) score, white blood cell count, total bilirubin, hepatorenal syndrome, steatohepatitis, dyslipidemia, ascites, and hepatocellular carcinoma. Using a cross validation scheme yielded AUROCs of 0.82 for the derivation and 0.79 for the validation sets. In contrast, the MELD score alone yielded a lower AUROC of 0.69. When the MELD score was excluded from the original model the AUROC remained superior to MELD alone with an AUROC of 0.79. In addition, when MELD and all components of MELD were excluded the model was again superior to MELD alone with an AUROC of 0.77. The AUROCs are presented in Figure 2.
Discussion: The cirrhosis mortality prediction model can be used to identify patients at high risk for mortality after surviving an admission related to cirrhosis. Further, we demonstrate that our model is superior to the MELD score alone. This finding demonstrates that use of an unbiased approach can identify unique predictors of death in those with cirrhosis. The MELD score has extensively been adopted to predict patient outcomes and is associated with increased mortality and re-admission rates in individuals with cirrhosis. We demonstrate, however, that the contribution of the MELD score to improve model accuracy considering a large set of cirrhosis-related admissions is minimal when accompanied by twenty-six additional EMR variables selected by the adaptive LASSO feature selection algorithm. These findings suggest that there is a need to develop new types of cirrhosis-related indexes to predict outcomes that rely more extensively on EMR.
Nonalcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease worldwide. Risk factors for NAFLD disease progression and liver-related outcomes remain incompletely understood due to the lack of computational identification methods. The present study sought to design a classification algorithm for NAFLD within the electronic medical record (EMR) for the development of large-scale longitudinal cohorts.
We implemented feature selection using logistic regression with adaptive LASSO. A training set of 620 patients was randomly selected from the Research Patient Data Registry at Partners Healthcare. To assess a true diagnosis for NAFLD we performed chart reviews and considered either a documentation of a biopsy or a clinical diagnosis of NAFLD. We included in our model variables laboratory measurements, diagnosis codes, and concepts extracted from medical notes. Variables with P < 0.05 were included in the multivariable analysis.
The NAFLD classification algorithm included number of natural language mentions of NAFLD in the EMR, lifetime number of ICD-9 codes for NAFLD, and triglyceride level. This classification algorithm was superior to an algorithm using ICD-9 data alone with AUC of 0.85 versus 0.75 (P < 0.0001) and leads to the creation of a new independent cohort of 8458 individuals with a high probability for NAFLD.
The NAFLD classification algorithm is superior to ICD-9 billing data alone. This approach is simple to develop, deploy, and can be applied across different institutions to create EMR-based cohorts of individuals with NAFLD.
Nonalcoholic fatty liver diseaseNonalcoholic steatohepatitisElectronic medical recordsTriglycerides
Summary: We demonstrate several advantages of applying data mining techniques on time-dependent Electronic Medical Records (EMR), specifically: 1) combining structured and unstructured variables improves the accuracy of a type-2 diabetes (T2D) classification algorithm, 2) conducting a quantitative survey of multiple comorbidities is important in T2D especially cardiovascular complications with hazard ratios, 3) analyzing time dependent variables can clarify time dependent contributions to variety of comorbidities, and specifically of the “obesity paradox”, and 4) demonstrating that an unbiased examination of physician treatment patterns reveals changes over time consistent with clinical trials.
Background: Cohorts assembled from EMR present a potentially powerful resource to study T2D and cardiovascular complications at population scale. Recent reports have demonstrated the utility of EMR analysis to discover genotype-phenotype correlations, sub-categories of disease, and adverse drug events.
Methods: We developed a classification algorithm to identify T2D patients based on characteristics including clinical notes, diagnosis and procedure codes, medications, and laboratory tests. We analyzed an EMR database at MGH and BWH considering patients who received care between 1990 - 2013. We applied logistic regression with the adaptive LASSO using different combinations of variables such as structured variables only, unstructured variables only, and combination of all variables. To determine the level of association between clinical and demographic variables with mortality we developed baseline and lagged-time varying Cox regression models that included an adjustment to ethnicity and time varying covariates. To observe at changes in frequency ratios of medical concepts as a function of time, considering also the effects of clinical trials publications, we focus on heart failure related concepts extracted from clinical notes (e.g., Aldosterone, Biventricular Pacemaker). To assess how therapeutic relationships change over time, we calculated sparse covariance matrices.
Results: Our classification algorithm identified 65,099 T2D patients with a specificity of 97% and PPV of 96%. The definition of “gold standard” included ≥ 1 measurements of HGB A1C ≥ 6.5% among other criteria. 56,691 patients (87.1%) had two and 38,449 patients (59.1%) had four or more chronic conditions, demonstrating the complexity of the cohort that we created in comparison with administrative claims databases that lack many clinical details. Cox regression models indicated statistically significant HRs > 1 for CHF, CAD, and CVD, and HRs < 1 for PCI and CABG. HRs for BMI were particularly interesting as increasing levels were associated with significant lower mortality as compared to the reference BMI (< 25 kg/m2). Further stratifying the results into 1, 3 and 5 years analysis, this “obesity paradox” is strikingly obvious at short-term follow-up of 1 year. It may be due to the fact that patients with low BMI were suffering from chronic medical conditions (e.g., malignancy or inflammatory conditions) increasing their 1 year mortality. However, at 3 and 5 years follow-up, we do see increase in mortality with increasing BMI levels likely related to increase in the burden of cardiovascular events.
Discussion: We implemented classification, prediction, and natural language processing techniques in multiple scenarios to create and to analyze a highly complex and large cohort, to aid understanding better patients, time-dependent entities digitally represented as a collection of data elements.
Summary: We develop a risk score for re-admission following an index congestive heart failure (CHF) admission, using codified and narrative variables from 17 years of electronic medical records (EMR). We show a strong correlation between calculated and observed re-admission frequencies throughout the range of risk, in both diabetic and non-diabetic populations.
Introduction/Background: Identifying patients at high risk of re-admission after an index admission for CHF is of major academic, operational and financial interest. We hypothesized that an unbiased method of risk discovery drawn from EMRs may identify patients at high risk for re-admission and provide opportunities for intervention.
Methods: We predicted the likelihood of an index CHF admission being followed by a subsequent admission for any cause within 30 days of discharge, using data available at two time points within the index admission: 1) the first 24 hours (“early”), and 2) at the time of discharge (“discharge”). Our study data included 17 years of inpatient CHF admissions at two urban tertiary care hospitals between 1993 - 2010. We focus on two cohorts: 1) 65,099 type-2 diabetes (T2D) patients, with 5,825 index CHF admissions (with 23.4% 30-day re-admission rate), and 2) a Non-Diabetic Cohort of 43,220 patients (2,203 index CHF admissions and 22.4% 30-day re-admission rate). We extracted 293 EMR variables including demographics, laboratory values and slopes, billing codes, cardiac parameters extracted from narrative electrocardiogram and echocardiographic reports, and medical concepts extracted from physician narrative notes using natural language processing.
Results: Using logistic regression with the adaptive LASSO, we found a strong correlation between predicted and observed risk of re-admission throughout the range of calculated risk for the Diabetic Cohort (r ≥ 0.99 for both the “early” and “discharge” models). Patients who had a re-admission within 30 days had a significantly higher predicted risk score vs. patients who were not re-admitted (“early”: 28.6% vs. 21.8%; p = 3.7 · 10-66, “discharge”: 29.4% vs. 21.5%; p = 2.7 · 10-77). Using a four-fold cross validation scheme yielded C-statistics of 0.65 and 0.67 for the “early” and “discharge” models, respectively. The “early” and “discharge” models had comparable accuracy in assigning patients to the highest and lowest deciles of re-admission risk. Significantly, the “discharge” model successfully re-classified a subset of patients of intermediate risk in the “early” model: calculated and observed re-admission rates for patients re-classified into the highest-risk decile in the “discharge” model were 45.0% and 43.6%, respectively. Applying an analogous approach to the Non-Diabetic Cohort yielded similar results.
Discussion: A generalizable method using unbiased variable selection and model building from EMR data can successfully identify patients at high or low risk of re-admission. “Early” data can identify high and low-risk groups; additional data generated during the admission and available at time of discharge can further re-classify additional individuals into high or low risk groups. This two-phase approach to risk estimation may facilitate intervention for high-risk patients earlier in the index hospital admission.
The invention relates generally to financial instrument classification and more particularly to methods and system for recognizing similarities in behaviors among financial instruments. According to one embodiment, a method of classifying similar financial instruments is provided. Classification analysis is performed on a desired financial instrument that a user specifies to determine other financial instruments that behave similarly to the specified financial instrument during a specified time range. Based on the classification, the similarly behaving financial instruments and additional characteristics are presented to the user for evaluation and tracking.
FIELD OF INVENTION
The present invention relates to financial instrument classification and more particularly, the present invention relates to financial instrument classification that is able to classify different financial instruments based on similarities in behavior patterns.
BACKGROUND AND PRIOR ART
Classification methods for financial instruments such as mutual funds, exchange-traded funds, stocks, and bonds, are commonly used to identify investments that meet one's personal criteria. Such methods aim to save time by narrowing one's search from hundreds of thousands of the worldly available investment choices down to a manageable number of specific investments for further research and examination. These classification methods (e.g., financial instrument screeners) facilitate a user to create a list of specific financial instruments he or she desires to further compare and analyze. This is achieved by letting the user specify comparison criteria applied to the list of financial instruments he or she is considering. Criteria include parameters such as performance history, investment style and category, and fees, to name a few.
One disadvantage of current financial instrument classification systems is the lack of ability to classify different financial instruments based on similarities in behavior patterns. An example of a behavior pattern would be a time series of a financial instrument considered in a specific time period, wherein the time series is a sequence of data points that represent the daily change in the financial instrument price. The level of similarity between two financial instruments is determined by calculating a Similarity Rank value and described in more detail in the Detailed Description section. Another disadvantage of current financial instrument classification systems is that they require the user to be financially knowledgeable enough to create a list of financial instruments of interest and to have the ability to pick the appropriate criteria. Another disadvantage is the inability to classify financial instruments from different classes, for example, to find behavioral similarities between a certain stock and a certain mutual fund or between a certain exchange-traded fund and a certain bond. Another disadvantage is the inability to classify financial instruments from different stock exchanges and/or from different countries, for example, to find behavioral similarities between a certain Israeli mutual fund and a certain American exchange-traded fund.
The described concepts relate to automated patient disposition. One example can receive a clinician's disposition for a patient. This implementation can perform parameter-based cluster analysis on the patient and a set of patients to identify a sub-set of the patients with which the patient has a high similarity. This example can also cause a graphical user interface to be generated that conveys parameters from the sub-set of the patients and the patient. BACKGROUND The present discussion relates to patient care. For instance, one profoundly fragile but consequential decision in hospitals is related to making decisions around the disposition of a patient. For example, such decisions can relate to when to discharge from the emergency department, versus admit, when to discharge from the hospital, when to transition from one part of the hospital to another, etc. Patient disposition is a complex and subtle decision with profound implications to the patient. One such scenario relates to “Failure to Rescue”. Failure to rescue refers to the phenomena where a patient is not transferred to the intensive care unit (ICU) soon enough and instead suffers cardiac or respiratory arrest on a non-ICU floor. Failure to Rescue is estimated to occur across the U.S. at a rate of almost 300,000 cases per year. In an attempt to remedy this phenomenon, today many emergency medicine rooms (EMRs) manually implement a series of rules to check whether a discharge might be dangerous. These rules may check for critical lab values or abnormal vital signs. The challenge is that there could be hundreds or thousands of rules that need to be written to cover the thousands of known lab tests available today. Yet when all those rules are written, there is also a factorial of interactions between those labs that may also be predictive of danger. Further, new laboratory tests and new knowledge on how to use those tests is entering healthcare at an exponential rate.
This paper presents a physical model developed to find the directions of forces and moments required to open a plastic bag—which forces will contribute toward opening the knot and which forces will lock it further. The analysis is part of the implementation of a Q(lambda)-learning algorithm on a robot system. The learning task is to let a fixed-arm robot observe the position of a plastic bag located on a platform, grasp it, and learn how to shake out its contents in minimum time. The physical model proves that the learned optimal bag shaking policy is consistent with the physical model and shows that there were no subjective influences. Experimental results show that the learned policy actually converged to the best policy.
This paper presents a new reinforcement learning algorithm that enables collaborative learning between a robot and a human. The algorithm which is based on the Q(λ) approach expedites the learning process by taking advantage of human intelligence and expertise. The algorithm denoted as CQ(λ) provides the robot with self awareness to adaptively switch its collaboration level from autonomous (self performing, the robot decides which actions to take, according to its learning function) to semi-autonomous (a human advisor guides the robot and the robot combines this knowledge into its learning function). This awareness is represented by a self test of its learning performance. The approach of variable autonomy is demonstrated and evaluated using a fixed-arm robot for finding the optimal shaking policy to empty the contents of a plastic bag. A comparison between the CQ(λ) and the traditional Q(λ)-reinforcement learning algorithm, resulted in faster convergence for the CQ(λ) collaborative reinforcement learning algorithm.
To accelerate the use of robots in everyday tasks they must be able to cope with unstructured, unpredictable, and continuously changing environments. This requires robots that perform independently and learn both how to respond to the world and how the world responds to actions the
robots undertake. One approach to learning is reinforcement learning (RL), in which the robot acts via a process guided by reinforcements from the environment that indicate how well it is performing the required task. Common RL algorithms in robotic systems include Q and its variation Q(λ) -learning, which are model-free off-policy learning algorithms that select actions according to several control policies. Although Q and Q(λ) learning have been used in many robotic applications, these approaches must be improved. Their drawbacks include: (i) extremely expensive computability, (ii) large state-action spaces, and (iii) long learning times (until convergence to an optimal policy). This thesis presents a new collaborative learning algorithm, denoted the CQ(λ) algorithm, that is based on the Q(λ) -learning algorithm. The CQ(λ) -learning algorithm was developed, tested and applied for two frameworks: (i) learning by multiple agents, and (ii) learning by human-robot systems. In the first framework, collaboration involves taking the maximum of state-action values, i.e., the Q -value, across all learning agents at each update step. In the second framework, two levels of collaboration are defined for a human-robot learning system: (i) autonomous - the robot decides which actions to take, acting autonomously according to its Q(λ) learning function, and (ii) semiautonomous - a human operator (HO) guides the robot to take an action or a policy and the robot uses the suggestion to replace its own exploration process. The key idea here is to give the robot enough self awareness to adaptively switch its collaboration level from autonomous (self performing) to semi-autonomous (human intervention and guidance). This awareness is represented by a self test of its learning performance. The approach of variable autonomy is demonstrated in the context of an intelligent environment using mobile and fixed-arm robots. Extensive experimentation with different robotic systems in a variety of applications demonstrated the strengths and weaknesses of the algorithm. Applications specifically developed for testing the CQ(λ) -learning algorithm are demonstrated in the context of an intelligent environment using a mobile robot for navigation and a fixed-arm robot for the inspection of suspicious objects. The results revealed that CQ(λ) is superior over the standard Q(λ) algorithm. The suggested learning method is expected to reduce both the number of trials needed and the time required for a robot to learn a task.
This paper presents a scheduling reinforcement learning algorithm designed for the execution of complex tasks. The algorithm presented here addresses the highlevel learning task of scheduling a single transfer agent (a robot arm) through a set of sub-tasks in a sequence that will achieve optimal task execution times. In lieu of fixed interprocess job transfers, the robot allows the flexibility of job movements at any point in time. Execution of a complex task was demonstrated using a Motoman UP-6 six degree of freedom fixed-arm robot, applied to a toast making system. The algorithm addressed the scheduling of a sequence of toast transitions with the objective of minimal completion time. Experiments performed examined the trade-off between exploration of the state-space and exploitation of the information already gathered, and its effects on the algorithm’s performance. Comparison of the suggested algorithm to the Monte-Carlo method and a random search method demonstrated the superiority of the algorithm over a wide range of learning conditions. The results were assessed against the optimal solution obtained by Branch and Bound.
This paper describes the design of multi-category support vector machines (SVMs) for classification of bags. To train and test the SVMs a collection of 120 images of different types of bags were used (backpacks, small shoulder bags, plastic flexible bags, and small briefcases). Tests were conducted to establish the best polynomial and Gaussian RBF (radial basis function) kernels. As it is well known that SVMs are sensitive to the number of features in pattern classification applications, the performance of the SVMs as a function of the number and type of features was also studied. Our goal here, in feature selection is to obtain a smaller set of features that accurately represent the original set. A K-fold cross validation procedure with three subsets was applied to assure reliability. In a kernel optimization experiment using nine popular shape features (area, bounding box ratio, major axis length, minor axis length, eccentricity, equivalent diameter, extent, roundness and convex perimeter), a classification rate of 95% was achieved using a polynomial kernel with degree six, and a classification rate of 90% was achieved using a RBF kernel with 27 sigma. To improve these results a feature selection procedure was performed. Using the optimal feature set, comprised of bounding box ratio, major axis length, extent and roundness, resulted in a classification rate of 96.25% using a polynomial kernel with degree of nine. The collinearity between the features was confirmed using principle component analysis, where a reduction to four components accounted for 99.3% of the variation for each of the bag types.
This paper presents a method for autonomous recharging of a mobile robot, a necessity for achieving long-term robotic activity without human intervention. A recharging station is designed consisting of a stationary docking station and a docking mechanism mounted to an ER-1
Evolution Robotics robot. The docking station and docking mechanism serve as a dual-power source, providing a mechanical and electrical connection between the recharging system of the robot and a laptop placed on it. Docking strategy algorithms use vision based navigation. The result is a significantly low-cost, high-entrance angle tolerant system. Iterative improvements to the system, to resist environmental perturbations and implement obstacle avoidance, ultimately resulted in a docking success rate of 100 percent over 50 trials.
This paper presents a collaborative reinforcement learning algorithm, CQ(λ), designed to accelerate learning by integrating a human operator into the learning process. The CQ(λ)-learning algorithm enables collaboration of knowledge between the robot and a human; the human, responsible for remotely monitoring the robot, suggests solutions when intervention is required. Based on its learning performance, the robot switches between fully autonomous operation, and the integration of human commands. The CQ(λ) -learning algorithm was tested on a Motoman UP-6 fixed-arm robot required to empty the contents of a suspicious bag. Experimental results of comparing the CQ(λ) with the standard Q(λ) , indicated the superiority of the CQ(λ) while achieving an improvement of 21.25% in the average reward.
A system developed by Parco Wireless, a developer of an ultra-wideband (UWB) Radio Frequency Identification (RFID) technology was installed at the Washington Hospital Center (WHC). The system allows tracking of humans and equipment in three dimensions. The deployment covers several zones centered on the emergency department including Annex-4 and the Medial Media Lab located nearby. Parco real-time location system uses tags and readers licensed from Multispectral Solutions, an ultra-wideband specialist allows hospitals and clinics to track the status and exact location of patients, staff and essential equipment. UWB RFID active tags can be wore by patients, medical staff and robots. The credit card-size tags (Fig. 1) can be detected at a range of 600 feet at a frequency of up to 30 times a second. The tags are tracked by unique three-digit identification numbers emitted by each tag every second. Data delivered by UWB tags is used for a whole range of decisions and focused here on human-robot following.
This paper presents the design and implementation of a new reinforcement learning (RL) based algorithm. The proposed algorithm, CQ(lambda) (collaborative Q(lambda)) allows several learning agents to acquire knowledge from each other. Acquiring knowledge learnt by an agent via collaboration with another agent enables acceleration of the entire learning system; therefore, learning can be utilized more efficiently. By developing collaborative learning algorithms, a learning task solution can be achieved significantly faster if performed by a single agent only, namely the number of learning episodes to solve a task is reduced. The proposed algorithm proved to accelerate learning in navigation robotic problem. The CQ(lambda) algorithm was applied to autonomous mobile robot navigation where several robot agents serve as learning processes. Robots learned to navigate an 11 x 11 world contains obstacles and boundaries choosing the optimum path to reach a target. Simulated experiments based on 50 learning episodes showed an average improvement of 17.02% while measuring the number of learning steps required reaching definite optimality and an average improvement of 32.98% for convergence to near optimality by using two robots compared with the Q(lambda) algorithm [1, 2].
This paper describes a telerobotic system operated through a virtual reality (VR) interface. A least squares method is used to find the transformation mapping, from the virtual to real environments. Results revealed an average transformation error of 3mm. The system was tested for the task of planning minimum time shaking trajectories to discharge the contents of a suspicious package onto a workstation platform. Performance times to carry out the task directly through the VR interface showed rapid learning, reaching standard time (288 seconds) within 7 to 8 trials - exhibiting a learning rate of 0.79.
This research focuses on the development of a telerobotic system that employs several state-action policies to carry out a task using on-line learning with human operator (HO) intervention through a virtual reality (VR) interface. The case-study task is to empty the contents of an unknown bag for subsequent scrutiny. A system state is defined as a condition that exists in the system for a significant period of time and consists of the following sub-states: 1) the bag which includes a feature set such as its type (e.g., plastic bag, briefcase, backpack, or suitcase) and its condition (e.g., open, close, orientation, distortions in bag contour, partial hiding of a bag, changing of handle lengths); 2) the robot (e.g., gripper spatial coordinates, home position, idle, performing a task); 3) other objects (e.g., contents that fell out of the bag, obstructions) and 4) environmental conditions such as illumination (e.g., day or night). A system action takes the system to a new state. Action examples include initial grasping point, lift and shake trajectory, re-arranging the position of a bag to prepare it for better grasping and enable the system to verify if all the bag contents have been extracted. Given the system state and a set of actions, a policy is a set of state-action pairs to perform a robotic task. The system starts with knowledge of the individual operators of the robot arm, such as opening and closing the gripper, but it has no policy for deciding when these operators are appropriate, nor does it have knowledge about the special properties of the bags. A policy is defined as the best action for a given state. The system learns this policy from experience and human guidance. A policy is found to be beneficial if a bag was grabbed successfully and all its contents have been extracted. Learning the optimal policy for classifying system states will be conducted using two soft computing methods: 1) on-line adaptive resonance theory (ART) and 2) off-line support vector machines (SVMs). The inference of these methods will be a recommendation for a set of possible grasping points. Their recognition accuracy will be compared for a set of test cases. Reinforcement learning (e.g., Q-learning) will be used to find the best action (e.g., determining the optimal grasping point followed by a lift and shake trajectory) for a given state. When unknown system states are identified, the HO suggests solutions (policies) through a VR interface and the robot decides to accept or reject them. The HO monitors the interactions of the telerobot on-line and controls the system through the VR interface. Policy examples are to let the HO classify the type of a bag (e.g., a briefcase) when it was recognized mistakenly as a different type (e.g., a suitcase) and to provide a set of possible grasping points by the HO when the system finds it difficult to recognize points that are beneficial for completing the task. When HO intervention is found to be beneficial, the system learns, and its dependence on the HO decreases. For testing the above, an advanced virtual reality (VR) telerobotic bag shaking system is proposed. It is assumed that several kinds of bags are placed on a platform. All locks have been removed and latches and zippers opened. The task of the system is to empty the contents of an unknown bag onto the platform for subsequent scrutiny. It is assumed that the bag has already passed X-ray inspection to ensure the bag is not empty and does not contain obvious explosives (e.g., mines, gun bullets). HO collaboration is conducted via a VR interface, which has an important role in the system. The HO either manipulates the 3D robot off-line, suggests solutions (e.g., the robot learns an optimal grasping location and avoids others) or changes and adds lifting and shaking policies on-line. When the robot encounters a situation it cannot handle, it relies on HO intervention. HO intervention will be exploited by the system to support the evolution of autonomy in two ways: first, by providing input to machine learning to support system adaptation, and second, by characterizing those situations when operator intervention is necessary when autonomous capabilities fail. Finally, measuring the amount of required operator intervention provides a metric for judging the system's level of autonomy - the less intervention, the higher the level of autonomy.