CHAPTER 6

  1. What is a bit and byte? Give examples of both

 

Bits and  Bytes both measure amounts of data. However, they are typically used in two different contexts.

Bits, kilobits (Kbps), and megabits (Mbps) are most often used to measure data transfer speeds. This may refer to how fast you are downloading a file, or how fast your Internet connection is. For example, if you are downloading a file on cable modem , your download speed might be 240Kbps. This is much faster than a dial-up modem, which maxes out at 56Kbps.

Bytes, on the other hand, are used to measure data storage. For example, a CD holds 700MB (megabytes) of data and a hard drive may hold 250GB (gigabytes). The other important difference is that bytes contain eight bits of data. Therefore, a 240Kbps download is only transferring 30KB of data per second. However, kilobytes per second is not as commonly used as kilobits per second for measuring data transfer speeds. After all, using kilobits per second (Kbps) makes your connection sound eight times faster!

It is important to know that bytes are abbreviated with a capital B, where as bits use a lowercase b. Therefore, Mbps is megabits per second, and MBps is megabytes per second. So 8Mbps is equal to 1MBps.

 

  1. What is data cleansing? Why and when does an organization require data cleansing?

Data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate records  from a record set, table , or database  and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dity  or coarse data.[1] Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting .

After cleansing, a data set  should be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary  definitions of similar entities in different stores. Data cleansing differs from data validation  in that validation almost invariably means data is rejected from the system at entry and is performed at the time of entry, rather than on batches of data.

The actual process of data cleansing may involve removing typographical errors   or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal Code) or fuzzy  (such as correcting records that partially match existing, known records). Some data cleansing solutions will clean data by cross checking with a validated data set. A common data cleansing practice is data enhancement, where data is made more complete by adding related information. For example, appending addresses with any phone numbers related to that address. Data cleansing may also involve activities like, harmonization of data, and standardization of data. For example, harmonization of short codes (st, rd, etc.) to actual words (street, road, etcetera). Standardization of data is a means of changing a reference data set to a new standard, ex, use of standard codes.

help companies save time and increase their efficiency. Data Cleansing Software tools are used by various organizations to remove duplicate data, fix and amend badly-formatted, incorrect and amend incomplete data from marketing lists, databases and CRM’s.  They can achieve in a short period of time what could take days or weeks for an administrator working manually to fix. This means that companies can save not only time but money by acquiring data cleaning tools.

Data cleansing is of particular value to organizations that have vast swathes of data to deal with. These organizations can include banks or government organizations but small to medium enterprises can also find a good use for the programmed. In fact, it’s suggested by many sources that any firm that works with and hold data should invest in cleansing tools. The tools should also be used on a regular basis as inaccurate data levels can grow quickly, compromising database and decreasing business efficiency.

 

 Referrences 

https://web.stanford.edu/class/cs101/bits-bytes.html

https://en.wikipedia.org/wiki/Data_wrangling

http://www.winpure.com/blog/importance-of-data-cleansing/

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s