<< back to BigDataDoing Your Own Project
1. Ideas Now you are going to come up with your own ideas...
Some Examples:
1. Maryland/states Data
A few files are provided for example with their fields. Think of what fields may be of interest - to search and show (e.g search by state, then show only certain items of data) File: Choose_Maryland___Compare_Metros_-_Workforce.csv MetroCityName, #Employment, #Unemployment, UnemploymentRate%, ProfessionalTechnicalWorkers%,
"Median Hourly Wage - Freight, Stock and
Material Movers ($ Dollars)",
"Median Hourly Wage - Secretaries ($
Dollars)",
"Median Hourly Wage - Computer Systems Analysts
($ Dollars)"
You could have the user lookup by city, and see the demographics on it, or compare them by unemployment/salaries/etc File: Choose_Maryland___Compare_States_-_Demographics.csv State, Population-2015, Population-2010,
Population-2000, PopulationChange2000-2010 (%),
MedianAge, PopulationDensity, MedianHouseholdIncome($ Dollars), PersonalIncome($ Dollars),
You could compare states - lookup different ones
2. Basketball and Baseball Stats - Doug Stats - NBA/MLB stats (download the data in Raw form, right click and save. The stats are currently in the following format. (separated by "tabs" )
player team pos gp min fgm fga 3m 3a ftm fta or tr as st to bk pf tc ej ff pts
where: player = player name (either: last,first -or- last)
team = team name (3 letter abrev)
pos = player's position (PG, SG, SF, PF, C or ??)
gp = games played
min = total minutes played
fgm = field goals made
fga = field goals attempted
3m = threes made
3a = threes attempted
ftm = free throws made
fta = free throws attempted
or = offensive rebounds
tr = total rebounds
as = assists
st = steals
to = turnovers
bk = blocks
ba = blocked attempts (Note, there is little quality here, so far)
pf = personal fouls
dq = disqualifications
pts = total points
tc = technicals
sta = games started
3. Tweets from Last Spring - What is the format of the data?
The data is a CSV with emoticons removed. Data file format has 6 fields:
0 - the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
1 - the id of the tweet (2087)
2 - the date of the tweet (Sat May 16 23:58:44 UTC 2009)
3 - the query (lyx). If there is no query, then this value is NO_QUERY.
4 - the user that tweeted (robotickilldozr)
5 - the text of the tweet (Lyx is cool)
4. Try one of these database -
Kaggle - good collection of interesting data sets, just scroll down the page |
 Updating...
cyril.pruszko@pgcps.org, Apr 7, 2016, 8:34 AM
cyril.pruszko@pgcps.org, Apr 7, 2016, 8:34 AM
|