Big Data‎ > ‎

BigData - Projects

<< back to BigData
Doing Your Own Project

Past Projects - check some of them out


1. Ideas
    Now you are going to come up with your own ideas...

Some Examples: 

    1. Maryland/states Data
    A few files are provided for example with their fields. Think of what fields may be of interest
     - to sort on or 
    -  to search and show (e.g search by state, then show only certain items of data)

Examples:

    File: Choose_Maryland___Compare_Metros_-_Workforce.csv
    Fields: 
MetroCityName, #Employment, #Unemployment, UnemploymentRate%, ProfessionalTechnicalWorkers%, 
"Median Hourly Wage - Freight, Stock and Material Movers ($ Dollars)",
"Median Hourly Wage - Secretaries ($ Dollars)",
"Median Hourly Wage - Computer Systems Analysts ($ Dollars)"

    You could have the user lookup by city, and see the demographics on it, or compare them by unemployment/salaries/etc

    -----

     File: Choose_Maryland___Compare_States_-_Demographics.csv
    Fields: 
State, Population-2015, Population-2010, Population-2000, PopulationChange2000-2010 (%), 
MedianAge, PopulationDensity, MedianHouseholdIncome($ Dollars), PersonalIncome($ Dollars), 
Poverty Rate (%)

     You could compare states - lookup different ones


    2. Basketball and Baseball Stats 
        Doug Stats - NBA/MLB stats (download the data in Raw form, right click and save.

The stats are currently in the following format. (separated by "tabs" )

        player team pos gp min fgm fga 3m 3a ftm fta or tr as st to bk pf tc ej ff pts

where:  player = player name  (either: last,first -or- last)
	team   = team name (3 letter abrev)
	pos    = player's position (PG, SG, SF, PF, C or ??)
	gp     = games played
	min    = total minutes played
	fgm    = field goals made
	fga    = field goals attempted
	3m     = threes made
	3a     = threes attempted
	ftm    = free throws made
	fta    = free throws attempted
	or     = offensive rebounds
	tr     = total rebounds
	as     = assists
	st     = steals
	to     = turnovers
	bk     = blocks
	ba     = blocked attempts (Note, there is little quality here, so far)
	pf     = personal fouls
	dq     = disqualifications
	pts    = total points
	tc     = technicals
	sta    = games started

    3. Tweets from Last Spring 

What is the format of the data?

The data is a CSV with emoticons removed. Data file format has 6 fields:
0 - the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
1 - the id of the tweet (2087)
2 - the date of the tweet (Sat May 16 23:58:44 UTC 2009)
3 - the query (lyx). If there is no query, then this value is NO_QUERY.
4 - the user that tweeted (robotickilldozr)
5 - the text of the tweet (Lyx is cool)


    4. Try one of these database 

        Kaggle - good collection of interesting data sets, just scroll down the page

    Public Sources
           Awesome Public Datasets
           KD Nuggets
         Kaggle - good collection of interesting data sets
30 Data Resources
Public Databases - Large public databases 
Open Source Sports  

            ...more to come
You can find additional data sets at the Harvard University Data Science website. I was particularly interested in theirLinkedIn data setKDNuggets is also a great resource, and for more, check out this link






ċ
cyril.pruszko@pgcps.org,
Apr 7, 2016, 8:34 AM
ċ
cyril.pruszko@pgcps.org,
Apr 7, 2016, 8:34 AM
Comments