Home
» Describe three ways to handle missing field values.
» Describe the three ways to handle missing field values
Sunday, September 11, 2016
Describe the three ways to handle missing field values
The following methods handle missing field
values in database
When a field may be null, simply entering no
value may be sufficient. For example, suppose a customer zip code field is null
and a report summarizes total sales by month and zip code. How should sales to
customers with unknown zip codes be handled? Two options are used for handling
or preventing missing field values. Using a default value and not permitting
missing (null) values. Missing data are inevitable. According to Babad and
Hoffer (1984), the following are some other possible methods for handling
missing data:
1. Use attribute mean:
Replace missing values of an attribute with the
mean (or median if its discrete) value for that attribute in the database. For example, for a missing sales value when
computing monthly product sales, use a formula involving the mean of the existing
monthly sales values for that product indexed by total sales for that month
across all products. Such estimates must be marked so that users know that these
are not actual values.
2. Use a global constant to fill in for missing
values:
Track missing field data
so that special reports and other system elements cause people to resolve
unknown values quickly. This can be done by setting up a trigger in the database
definition and decide a new global constant value, like “unknown“, “NULL”
or minus infinity, that will be used to fill all the missing
values.
This technique is used because sometimes it just doesn’t make sense to try and predict the missing value.
This technique is used because sometimes it just doesn’t make sense to try and predict the missing value.
Example: let’s look at the
student’s enrollment database. Assuming the state of residence
attribute data is missing for some students. Filling it up with some state
doesn’t really makes sense so the opposite way is to place “NULL” in the
missing field so we can latter track and handle this missing field value.
3. Perform
sensitivity testing:
Perform sensitivity
testing so that missing data are ignored unless knowing a value might
significantly change results. (e.g., if total monthly sales for a particular salesperson
are almost over a threshold that would make a difference in that person’s compensation).
This is the most complex of the methods mentioned and hence requires the most
sophisticated programming. Such routines for handling missing data may be
written in application programs. All relevant modern DBMSs now have more
sophisticated programming capabilities, such as case expressions, user-defined
functions, and triggers, so that such logic can be available in the database
for all users without application-specific programming.
4. Use attribute mean for all samples belonging to
the same class:
Instead of using the
mean (or median) of a certain attribute calculated by looking at all the rows
in a database, we can limit the calculations to the relevant class to make the
value more relevant to the row we’re looking at.
Lets say you have a
cars pricing database that, among other things, classifies cars to “Luxury” and
“Low budget” and you’re dealing with missing values in the cost field.
Replacing missing cost of a luxury car with the average cost of all luxury cars
is probably more accurate then the value you’d get if you factor in the low
budget cars.
5. Use a data mining algorithm to predict the most
probable value:
The value can be
determined using regression, inference based tools using Baysian formalism ,
decision trees, clustering algorithms (K-Mean\Median etc.).
For example, we could
use a clustering algorithms to create clusters of rows which will then be used
for calculating an attribute mean or median as specified in first technique. Another
example could be using a decision tree to try and predict the probable value in
the missing attribute, according to other attributes in the data.
Subscribe to:
Post Comments
(
Atom
)
No comments :
Post a Comment