Select your font size 
 
about us products & services consulting & support news & events contact us
To make it clear how Bayes theorem works, you will develop an online

Medical diagnosis wizard - Quebec

print this article 
 

To make it clear how Bayes theorem works, you will develop an online medical diagnosis wizard using PHP. This wizard could also have been called a calculator except that it takes four input steps to supply the prerequisite information then a step to review the result.

The wizard works by asking the user to supply the various pieces of information critical to computing the full posterior probability. The user can examine the posterior distribution to determine which which disease hypothesis enjoys the highest probability based on:

  1. The diagnositic test information
  2. The sample data used to estimate the prior and likelihood distributions

Bayes Wizard: Step 1

Step 1 in using Bayes theorem to make a medical diagnosis involves specifying the number of disease alternatives that you will examine along with the number of symptoms or evidence keys. In the generic example you will look at, you will evaluate three disease alternatives based on evidence from two diagnostic tests. Each diagnostic test can only produce a positive or negative result. This means that the total number of symptom combinations, or evidence keys, you can observe is four (++, +-, -+, or --).

Figure 3. Form to enter disease hypotheses and symptom possibilities
Form to enter  disease hypotheses and symptom possibilities

Bayes Wizard: Step 2

Step 2 involves entering the disease and symptom labels. In this case, you are just going to enter d1, d2, and d3 for the disease labels and ++, +-, -+ and -- for the symptom labels. The two symbols used for symptom labels signify whether the results of the two diagnostic tests came out positive or negative.

Figure 4. Form to enter disease and symptom labels
Form to enter disease and symptom labels

Bayes Wizard: Step 3

Step 3 involves entering the prior probabilities for each disease. You will use the data table below to determine the prior probabilities to enter for step three and the likelihood to enter for step four (this data table originally appeared in Introduction to Probability). Using this example allows you to confirm that the final result you obtain from the wizard agrees with the results you can find in this book.

Figure 5. Joint frequency of diseases and symptoms
Joint frequency of diseases and symptoms

The prior probability of each disease refers to the number of patients diagnosed with each disease divided by the total number of diagnosed cases in this sample. The relevant prior probabilities for each disease are entered in the following:

Figure 6. Form to enter disease priors
Form to enter disease priors

You do not have to rely upon a data table such as the previous one to derive the prior probability estimates. In some cases, you can derive prior probabilities by using common-sense reasoning: The prior probability of a fair two-sided coin coming up heads is 0.5. The prior probability of selecting a queen of hearts from a randomized deck of cards is 1/52.

You also commonly run into situations where you intially have no good estimates of what the prior probability of each hypothesis might be. In such cases, it is common to posit noninformative priors. If you have four hypothesis alternatives, then the noninformative prior distribution would be 1/4 or 0.25 for each hypothesis. You might note here that Bayesians often criticize the use of a null hypothesis in significance testing because it amounts to assuming noninformative priors in cases where positing informative priors might be more theoretically or empirically justified.

A final way to derive estimates of the prior probability of each hypothesis P(Hi) is through a subjective estimate of what those probabilities might be given everything you have learned about the way the world works up to that point P( H=h | Everything you know). You will often find Bayesian inference sharing the same bed with a subjective view of probability in which the probability of a proposition is equated with one's subjective degree of belief in the proposition.

What it important in this discussion is that Bayesian inference is a flexible technique that allows you to estimate prior probabilities using objective methods, common-sense logical methods, and subjective methods. When using subjective methods, you must still be willing to defend your prior probability estimates. You may use objective data to help set and justify your subjective estimates which means that Bayesian inference is not necessarily in conflict with more objectively oriented approaches to statistical inference.

Bayes Wizard: Step 4

The data table provides you with information you can use to compute the probability of the symptoms (like test results) given the disease, also known as the likelihood distribution P(E | H).

To see how the likelihood values entered below were computed, you can unpack P(E|H) using the frequency format for computing conditional probabilities:

P(E | H) = {E & H} / {H}

This tells us that you need to divide a joint frequency count {E & H} by a marginal frequency count {H} to obtain the likelihood value for each cell in your likelihood matrix. The top left cell of your likelihood matrix P(E='++' | H='d1) can be immediately computed from the joint and marginal frequency counts appearing in the data table:

P(E='++' | H='d1) = 2110 / 3125 = .6562

All the likelihood values entered in Step 4 were computed in this manner.

Figure 7. Form to enter likelihood of symptoms given the disease
Form to enter likelihood of symptoms given the disease

It should be noted that many statisticians use likelihood as a system of inference instead of, or in addition to, Bayesian inference. This is because likelihoods also provide a metric one can use to evaluate the relative degree of support for several hypotheses given the data.

In the previous example, you can see that the probability of a particular evidence key varies for each hypothesis under consideration. The probability of the ++ evidence key is the greatest for the d1 hypothesis. You can assess which hypothesis is best supported by the data by:

  1. Examining the likelihood of the evidence key given each hypothesis key
  2. Selecting the hypothesis that maximizes the likelihood of the evidence key

Doing so would be an example of inference according to the principle of maximum likelihood.

Another interesting point to note is that all the values in the above likelihood distibution sum to a value greater than 1. What this means is that the likelihood distribution is not really a probability distribution because it lacks the defining property that the distribution of values sum to 1. This summation property is not essential for the purposes of evaluating the relative support for different hypotheses. What is important for this purpose is that the "likelihood supplies a natural order of preference among the possibilities under consideration" (from R.A. Fisher's Statistical Methods and Scientific Inference, p. 68).

You may not understand fully the concept of likelihood from this brief discussion, but I do hope that you appreciate its importance to the overall Bayes theorem calculation and its importance as the foundation for another system of inference. The likelihood system of inference is preferred by many statisticians because you don't have to resort to the dubious practice of trying to estimate the prior probability of each hypothesis.

Maximum likelihood estimators also have many desirable mathematical properties that make them nice to work with (the properties include transitivity, additivity, a lack of bias, and invariance under transformations, among others). For these reasons, it is often a good idea to closely examine your likelihood distribution in addition to your posterior distibution when making inferences from your data.

Bayes Wizard: Step 5

The final step of the process involves displaying the posterior distribution of the diseases given the symptoms P(H | E):

Figure 8. Probability of each disease given symptoms
Probability of each disease given symptoms

The section of the script that was used to compute and display the posterior distribution looks like this:

Listing 4. Computing and displaying the posterior distribution
<?php
include "Bayes.php";

$disease_labels = $_POST["disease_labels"];
$symptom_labels = $_POST["symptom_labels"];
$priors         = $_POST["priors"];
$likelihoods    = $_POST["likelihoods"];

$bayes = new Bayes($priors, $likelihoods);
$bayes->getPosterior();
$bayes->setRowLabels($symptom_labels);    // aka evidence labels
$bayes->setColumnLabels($disease_labels); // aka hypothesis labels
$bayes->toHTML();
?>

You begin by loading the Bayes constructor with the priors and likelihoods obtained from previous wizard steps. Using this information, you compute the posterior using the $bayes->getPosterior() method. To output the posterior distribution to the browser, you first set the row and column labels to display, then output the posterior distribution using the $bayes->toHTML() method.



Page:   1  2  3  4  5  6  7  8  9  10  11 Next Page: Implementing the calculation with Bayes.php

The content shown in this page was first published by IBM developerWorks and is reprinted with permission from Paul Meagher (www.datavore.com)


Most Recent Website and Regional Updates

 Transparen Toronto Office Locations
Addresses of Transparen Corporation offices in Toronto, Ontario.

 
 High Scalability - Large Systems Optimization
Transparen Corporation lends its expertise to clients experiencing rapid and sudden growth in traffic or server utilization, bottlenecks, systems instability, downtime during peak traffic, or which would like to plan to avoid such issues.

 
 Throughput (or Bandwidth) vs. Latency
This document uses the example of Bill Gates purchasing Google to explain the difference between bandwidth (or throughput) and latency.

 
 Emergency Management Services
The prototypical emergency involves a shutdown of essential services for a finite period of time. What will your organization do when a world-wide financial crisis strikes?

 
 Fast RAID Server Data Recovery Service
Transparen's Vancouver International Response Team provides the option in Canada and USA to get a raid server back running in hours - eliminating costly waiting associated with typical RAID recoveries.

 
 Data Recovery Service
Have you deleted a mission critical file? Accidentally dropped a computer, or formatted a hard drive? No recent backup? Mistakes can happen, but the data might still be there.

 
 About Transparen
Transparen is committed to serving its clients.

 
 Appointment to the Saguenay Port Authority
OTTAWA ? The Honourable Lawrence Cannon, Minister of Transport, Infrastructure and Communities today announced the appointment of Mr. Jean-Sébastien Harvey to the board of directors of the Saguenay Port Authority for a term of three years.

 
 The Government of Canada continues its investments in the ports of Rimouski and Matane in Quebec
RIMOUSKI (Quebec) - During his visit to Rimouski today, the Honourable Lawrence Cannon, Minister of Transport, Infrastructure and Communities, announced an investment in the order of $7.3 million in the ports of Rimouski and Matane. The invested funds will go toward different repairs that will maintain the safety of the facilities and keep the ports in optimal operating condition.

 
 The Government of Canada invests in the Port of Gaspé (Sandy Beach) in Quebec
GASPÉ (Quebec) - During his visit to Gaspé, the Honourable Lawrence Cannon, Minister of Transport, Infrastructure and Communities, gave the go-ahead for a $5.5 M project to rebuild Rue du Quai in the Port of Gaspé (Sandy Beach). The first steps will involve finalizing the different preliminary studies required, establishing the road alignment, surveying, preparing a contaminated material management plan, conducting an environmental impact assessment and completing the final plans and specifications.

 
 Summer tour of eastern Quebec
The Honourable Lawrence Cannon, Minister of Transport, Infrastructure and Communities, will make different announcements during a summer tour of eastern Quebec...

 
 Opening of 4-lane divided stretch of highways 73/175
SAGUENAY ? The governments of Canada and Quebec are today announcing the opening of a rebuilt 4-lane divided stretch of Highway 175 between km 208 and km 213, near the northern boundary of the Laurentian Wildlife Reserve, and between km 219 and 227, at the entrance of Saguenay.

 
 08/01/2009: How to Divorce and Not Wreck the Kids
For years, divorce has pitted couples against each other, fueling conflict and concerns about the children caught in the middle of it. Now, unhappy couples with children are looking for ways to end their marriage, but not end the family. Today on the podast, we'll hear from a couple trying to do that and the director of a CBC TV documentary called "How To Divorce and Not Wreck The Kids".

 
 07/01/2009: A Death in the Family - Documentary
Today on the podcast, the story of Paul Johnson and Bill Mullins-Johnson, two brothers from Sault Saint Marie, Ontario whose lives were torn apart after the murder of Paul's four-year-old daughter ... a crime that turned the two men against each other even though neither of them had committed it.

 
 06/01/2009: The Threatening Sea
Today on the podcast, we continue our Watershed series with a trip to Vanuatu, a nation of 83 islands in the South Pacific that is slowly but surely sinking into the sea.

 

Google
 
Web transparen.com

Contact Information

Related Information

 
   
 
E C M | © 2003-2007 Transparen Corp.      

Standardized Services: Data Recovery Service / Creative Services / Premium Web Hosting Services / System Administration Tech Support Services
Recent Projects: Full-Service Mortgage and Financing Company / System to manage flights from Vancouver to Tofino / Photo exchange verification service
Our Vancouver BC Server Proudly Hosts: automated parking and revenue control systems, leafside lane at southlands, cost effective alternative power sources, Higher Grade Learning Centres, pacific forage bag supply, sunburst medical, neosonic design, roger mahler photography - passionate, intriguing, desirable, the connection between east and west, affordable flights to victoria and tofino, low interest mortgage brokers in vancouver, richmond, surrey, toronto, Toronto Calgary and Vancouver IT staffing and talent search
* Acton Vale * Alma * Amos * Amqui * Asbestos * Baie-Comeau * Baie-Saint-Paul * Barkmere * Beauceville * Beauharnois * Beaupré * Bécancour * Bedford * Belleterre * Beloeil * Berthierville * Blainville * Boisbriand * Bois-de-Filion * Bonaventure * Bromont * Brownsburg-Chatham * Cabano * Candiac * Cap-Chat * Cap-Santé * Carignan * Carleton-Saint-Omer * Causapscal * Chambly * Chandler * Chapais * Charlemagne * Châteauguay * Château-Richer * Chibougamau * Clermont * Coaticook * Contrecoeur * Cookshire-Eaton * Cowansville * Danville * Daveluyville * Dégelis * Delson * Desbiens * Deux-Montagnes * Disraeli * Dolbeau-Mistassini * Donnacona * Drummondville * Dunham * Duparquet * East Angus * Farnham * Fermont * Forestville * Fossambault-sur-le-Lac * Gaspé * Gatineau * Gracefield * Granby * Grande-Rivière * Hudson * Huntingdon * Joliette * Kingsey Falls * Lac-Brome * Lac-Delage * Lachute * Lac-Mégantic * Lac-Saint-Joseph * Lac-Sergent * La Malbaie * La Pocatière * La Prairie * La Sarre * L'Assomption * La Tuque * Laval * Lavaltrie * Lebel-sur-Quévillon * L'Épiphanie * Léry * Lévis * L'Île-Cadieux * L'Île-Perrot * Longueuil * Lorraine * Louiseville * Magog * Malartic * Maniwaki * Marieville * Mascouche * Matagami * Matane * Mercier * Métabetchouan- Lac-à-la-Croix * Métis-sur-Mer * Mirabel * Mont-Joli * Mont-Laurier * Montmagny * Montreal - largest city * Mont-Saint-Hilaire * Mont-Tremblant * Murdochville * Neuville * New Richmond * Nicolet * Normandin * Notre-Dame-de-l'Île-Perrot * Notre-Dame-du-Lac * Otterburn Park * Paspébiac * Percé * Pincourt * Plessisville * Pohénégamook * Port-Cartier * Pont-Rouge * Portneuf * Prévost * Princeville * Quebec - provincial capital * Repentigny * Richelieu * Richmond * Rimouski * Rivière-du-Loup * Rivière-Rouge * Roberval * Rosemère * Rouyn-Noranda * Saguenay * Saint-Basile * Saint-Basile-le-Grand * Saint-Césaire * Saint-Constant * Sainte-Adèle * Sainte-Agathe-des-Monts * Sainte-Anne-de-Beaupré * Sainte-Anne-des-Monts * Sainte-Anne-des-Plaines * Sainte-Catherine * Sainte-Catherine-de- la-Jacques-Cartier * Sainte-Julie * Sainte-Marguerite-Estérel * Sainte-Marie * Sainte-Marthe-sur-le-Lac * Sainte-Thérèse * Saint-Eustache * Saint-Félicien * Saint-Gabriel * Saint-Georges * Saint-Hyacinthe * Saint-Jean-sur-Richelieu * Saint-Jérôme * Saint-Joseph-de-Beauce * Saint-Joseph-de-Sorel * Saint-Lazare * Saint-Lin-Laurentides * Saint-Marc-des-Carrières * Saint-Ours * Saint-Pamphile * Saint-Pascal * Saint-Pie * Saint-Raymond * Saint-Rémi * Saint-Sauveur * Saint-Tite * Salaberry-de-Valleyfield * Schefferville * Scotstown * Senneterre * Sept-Îles * Shawinigan * Sherbrooke * Sorel-Tracy * Stanstead * Sutton * Témiscaming * Terrebonne * Thetford Mines * Thurso * Trois-Pistoles * Trois-Rivières * Valcourt * Val-d'Or * Varennes * Vaudreuil-Dorion * Victoriaville * Ville-Marie * Warwick * Waterloo * Waterville * Windsor