<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Twirling</title>
	<atom:link href="http://www.brocktibert.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.brocktibert.com/blog</link>
	<description>My rants on all things data</description>
	<lastBuildDate>Thu, 31 Jan 2013 13:36:09 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>US News National Rankings &#8211; Yield Rates and R</title>
		<link>http://www.brocktibert.com/blog/2013/01/30/us-news-national-rankings-yield-rates-and-r/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=us-news-national-rankings-yield-rates-and-r</link>
		<comments>http://www.brocktibert.com/blog/2013/01/30/us-news-national-rankings-yield-rates-and-r/#comments</comments>
		<pubDate>Thu, 31 Jan 2013 04:46:13 +0000</pubDate>
		<dc:creator>btibert3</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.brocktibert.com/blog/?p=471</guid>
		<description><![CDATA[Today I saw a post about the yield rates at institutions considered in the &#8220;National&#8221; Rankings for US News.  Above all else, I was interested in the data table contained within the post.  I wrote a quick R script to grab the table and plot where a particular school might find themselves given the data reported.  The script generates a [...]]]></description>
				<content:encoded><![CDATA[<p>Today I saw a<a href="http://www.usnews.com/education/best-colleges/articles/2013/01/28/national-universities-where-most-accepted-students-enroll"> post about </a>the yield rates at institutions considered in the &#8220;National&#8221; Rankings for US News.  Above all else, I was interested in the data table contained within the post.  I wrote a quick R script to grab the table and plot where a particular school might find themselves given the data reported.  The script generates a plot and highlights where the school would fall within the distribution.</p>
<p>If you want to run the R code, all you need to do is ensure that you have the XML package and change the BM variable to a value of interest.  The image below highlights what one institution might see.</p>
<p><a href="http://www.brocktibert.com/blog/wp-content/uploads/2013/01/usnews-yield.png"><img class="aligncenter size-full wp-image-474" title="usnews-yield" src="http://www.brocktibert.com/blog/wp-content/uploads/2013/01/usnews-yield.png" alt="" width="857" height="399" /></a></p>
<p>The code for this analysis can be found below:</p>
<script src="https://gist.github.com/4680195.js"></script>
]]></content:encoded>
			<wfw:commentRss>http://www.brocktibert.com/blog/2013/01/30/us-news-national-rankings-yield-rates-and-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EPS Market Map in R</title>
		<link>http://www.brocktibert.com/blog/2012/11/06/eps-market-map-in-r/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=eps-market-map-in-r</link>
		<comments>http://www.brocktibert.com/blog/2012/11/06/eps-market-map-in-r/#comments</comments>
		<pubDate>Tue, 06 Nov 2012 14:42:34 +0000</pubDate>
		<dc:creator>btibert3</dc:creator>
				<category><![CDATA[How-to]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Visualize data]]></category>
		<category><![CDATA[Enrollment Management]]></category>
		<category><![CDATA[gis]]></category>

		<guid isPermaLink="false">http://www.brocktibert.com/blog/?p=429</guid>
		<description><![CDATA[There are a few minor tweaks renaming on this map before it is complete, but I wanted to share the EPS Market Map I put together.  It can be downloaded using this link. This file is meant to be used with R and divides the lower 48 states into the CollegeBoard&#8217;s Enrollment Planning Service markets. [...]]]></description>
				<content:encoded><![CDATA[<p>There are a few minor tweaks renaming on this map before it is complete, but I wanted to share the EPS Market Map I put together.  It can be downloaded using this <a title="link" href="https://www.dropbox.com/s/n2696a70zeoruoj/eps-map.Rdata?m" target="_blank">link</a>.</p>
<p>This file is meant to be used with R and divides the lower 48 states into the CollegeBoard&#8217;s Enrollment Planning Service markets. To build the territories, I used the crosswalk file provided on the EPS search site (in the appendix) and &#8216;dissolved&#8217; the zip codes into markets. Help on how I performed this task can be found <a title="here" href="http://gis.stackexchange.com/questions/37370/merge-zip-codes-to-create-new-sales-markets" target="_blank">here </a>and <a title="here" href="http://gis.stackexchange.com/questions/37503/rename-a-spatialpolygon-class-object-in-r" target="_blank">here</a>.</p>
<p>As you will see below, there are still a few gaps in the map that I need to fill in.  Ideally, the Collegeboard would have provided the necessary GIS files to us, but currently that is not an option.</p>
<p>My end game is to use this file to geocode Lat/Long data to EPS territories in addition to basic choropleth mapping for enrollment planning.  If you want to contribute to this project, please don&#8217;t hesitate to reach out!</p>
<p>The Rdata file currently includes 3 objects.  This will change as I finalize the map files.</p>
<ol>
<li><strong>eps.missing</strong> which is a data frame of zip codes that still need to be associated with an EPS territory</li>
<li><strong>myzip</strong> which is a SpatialPolysDataFrame object.  It is the map of the lower 48 by zip code.  To plot, simply use the command plot(myzip) but note it will take a minute or so depending on your machine</li>
<li><strong>eps.markets</strong> is the working draft of the eps markets map and is the same type as myzip</li>
</ol>
<div><span style="line-height: 19.200000762939453px; font-size: medium;">Here are two quick plots of the map.  The image on the top simply plots each market as red, which helps in finding the gaps.  The image on the bottom uses a random color for each market.</span></div>
<div></div>
<div>
<pre>&gt; plot(eps.markets, col=sample(colors(), 301, replace=T))
&gt; plot(eps.markets, col="red")</pre>
<p><a href="http://www.brocktibert.com/blog/wp-content/uploads/2012/11/red.png"><img class="aligncenter size-full wp-image-433" title="red" src="http://www.brocktibert.com/blog/wp-content/uploads/2012/11/red.png" alt="" width="1000" height="700" /></a><a href="http://www.brocktibert.com/blog/wp-content/uploads/2012/11/random.png"><img class="aligncenter size-full wp-image-432" title="random" src="http://www.brocktibert.com/blog/wp-content/uploads/2012/11/random.png" alt="" width="1000" height="700" /></a><br />
&nbsp;</p>
</div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.brocktibert.com/blog/2012/11/06/eps-market-map-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using FAFSA Data to study Competitors &#8211; Part 2</title>
		<link>http://www.brocktibert.com/blog/2012/10/25/using-fafsa-data-to-study-competitors-part-2/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=using-fafsa-data-to-study-competitors-part-2</link>
		<comments>http://www.brocktibert.com/blog/2012/10/25/using-fafsa-data-to-study-competitors-part-2/#comments</comments>
		<pubDate>Fri, 26 Oct 2012 02:47:31 +0000</pubDate>
		<dc:creator>btibert3</dc:creator>
				<category><![CDATA[Higher Education]]></category>
		<category><![CDATA[Network Analysis]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Visualize data]]></category>
		<category><![CDATA[network analysis]]></category>

		<guid isPermaLink="false">http://www.brocktibert.com/blog/?p=410</guid>
		<description><![CDATA[I wanted to build upon my previous post and dive a little deeper into the sorts of questions we can answer using the FAFSA data supplied to us by applicants. As a quick overview, students completing the FAFSA for student aid can list up to ten institutions on the form. I consider this the student&#8217;s [...]]]></description>
				<content:encoded><![CDATA[<p>I wanted to build upon my previous <a title="Using FAFSA Data to Define Competitor Density" href="http://www.brocktibert.com/blog/2012/10/18/387/">post</a> and dive a little deeper into the sorts of questions we can answer using the FAFSA data supplied to us by applicants.</p>
<p>As a quick overview, students completing the FAFSA for student aid can list up to ten institutions on the form. I consider this the student&#8217;s consideration set. When aggregating these data, we can start to get a sense of the most frequently listed schools and how these institutions may be related.</p>
<p>With these data, you can manipulate the structure to answer a wide range of questions. One approach would coerce the data into a <a title="network" href="http://en.wikipedia.org/wiki/Network_theory" target="_blank">network</a>. For this task, I am going to use the statistical programming language R and the library igraph. The resulting network includes all schools listed (excluding the host institution) with weighted edges representing the # of co-occurences.</p>
<p>Listed below are some quick stats on my undirected network from the last few years:</p>
<ul>
<li>Graph density: 0.05108093</li>
<li>Diameter: 5</li>
<li>Average Path Length: 2.418751</li>
<li>Transitivity (clustering coefficient): 0.3390529</li>
</ul>
<p>Graph density is the ratio of edges related to the total number of possible edges. For context, an edge is a connection between two schools. If you think of Facebook, you and your friends are connected by an edge. Diameter is a measure of how many steps (edges) are required to connect the two farthest nodes in the network. The <a title="Average Path Length" href="http://en.wikipedia.org/wiki/Average_path_length" target="_blank">Average Path Length</a> is basically an average of how many steps it would take for all schools to be connected. The <a title="clustering coeffcient" href="http://en.wikipedia.org/wiki/Clustering_coefficient" target="_blank">clustering coefficient</a> is a measure of how well the nodes tend to cluster together (listed on the same FAFSA form).</p>
<p>Shown below is a plot of the graph, with each school sized by <a title="pagerank" href="http://en.wikipedia.org/wiki/PageRank" target="_blank">pagerank</a> score (included function in igraph).</p>
<p><a href="http://www.brocktibert.com/blog/wp-content/uploads/2012/10/plot-pagerank.png"><img class="aligncenter size-medium wp-image-413" title="plot-pagerank" src="http://www.brocktibert.com/blog/wp-content/uploads/2012/10/plot-pagerank-300x245.png" alt="" width="300" height="245" /></a></p>
<p>It&#8217;s easy to see that there are few key players in the FAFSA network; I consider these &#8220;core&#8221; competitors. More interesting to me, however, are the schools at the outer edge, as they are less common and speak to the choice set of an applicant.</p>
<p>In summary, this post was intended to be a quick overview of how one might employ network analysis to study the schools commonly listed on the FAFSA form for your institution. In the future, I will take the same data and use association rules to find common patterns of school listings.</p>
<p>EDIT: Here are the code snippets that I used to generate the data and plot above:</p>
<p><code></p>
<p></code></p>
<pre class="brush: python; gutter: true">## basic stats:
## density (graph.density)
graph.density(g)
## diamter
diameter(g, directed=F)
## average path length (shortest.paths)
average.path.length(g, directed=F)
## transivity (clustering coeffecient)
transitivity(g)
## radius
radius(g)
## degree distribution
plot(1-degree.distribution(g, cumulative=T), type=&quot;l&quot;,
xlab=&quot;degree&quot;, ylab=&quot;Cume Distribution&quot;, main=&quot;FAFSA Network&quot;)
g$layout pagerank plot(g,
vertex.size= pagerank*150,
vertex.label=NA,
vertex.color= &quot;red&quot;,
vertex.frame.color=&quot;black&quot;,
edge.arrow.size=0,
edge.color=colors()[239],
edge.width=.5,
edge.curved=TRUE,
layout=layout.auto(g))</pre>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.brocktibert.com/blog/2012/10/25/using-fafsa-data-to-study-competitors-part-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using FAFSA Data to Define Competitor Density</title>
		<link>http://www.brocktibert.com/blog/2012/10/18/387/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=387</link>
		<comments>http://www.brocktibert.com/blog/2012/10/18/387/#comments</comments>
		<pubDate>Fri, 19 Oct 2012 02:50:31 +0000</pubDate>
		<dc:creator>btibert3</dc:creator>
				<category><![CDATA[College Admissions]]></category>
		<category><![CDATA[Higher Education]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.brocktibert.com/blog/?p=387</guid>
		<description><![CDATA[I have been thinking a lot about how to define and discuss competition at the undergraduate level.   I will save the chat on which dataset is better (ASQ, Student Clearinghouse, social media, etc.) for another day. One common question I get as an analyst in Enrollment Management is how to &#8220;define&#8221; competition. While it&#8217;s [...]]]></description>
				<content:encoded><![CDATA[<p>I have been thinking a lot about how to define and discuss competition at the undergraduate level.   I will save the chat on which dataset is better (ASQ, Student Clearinghouse, social media, etc.) for another day.</p>
<p>One common question I get as an analyst in Enrollment Management is how to &#8220;define&#8221; competition. While it&#8217;s never an easy question, from a marketing perspective we often have to subset competition into a few levels: core, secondary, aspirant, regional, etc. Even before this, though, I believe it is critically important to understand &#8220;Competitor Density.&#8221;</p>
<p>Using a statistical lens, Competitor Density is rather straight forward. Simply, it is the cumulative density of students covered by &#8220;N&#8221; schools.  For illustration, refer to the chart below, which is filtered on domestic + admitted students over the last 3 applicant pools.</p>
<p><a href="http://www.brocktibert.com/blog/wp-content/uploads/2012/10/density-simple-fafsa1.png"><img class="aligncenter size-medium wp-image-389" title="density-simple-fafsa" src="http://www.brocktibert.com/blog/wp-content/uploads/2012/10/density-simple-fafsa1-300x270.png" alt="" width="300" height="270" /></a></p>
<p>The plot above reveals two very interesting facts:</p>
<ol>
<li>A small set of competitors represent a large share of the &#8220;core&#8221; competition.  While the plot above assumes that a student was admitted at every institution they listed on the form, this basic assumption allows us to broadly define the <em>consideration set</em> for an applicant.</li>
<li>After appending on other information from our student information systems (aggregated), we can start to answer some pretty complex questions about how students finalize their list of schools to which they eventually apply.</li>
</ol>
<div><span style="font-size: medium;">In a future post, I intend to highlight how analysts in highered can manipulate FAFSA data using <a href="http://en.wikipedia.org/wiki/Association_rules">association rules</a> and <a href="http://en.wikipedia.org/wiki/Network_theory">network theory</a>.</span></div>
<p>&nbsp;</p>
<p>In the interim, I will leave you with some basic stats on the plot above.  If you stumble across this post and you work in highered, feel free to comment and post comparable stats.  I would love to see how these data vary across different institutions.</p>
<p>Please remember that the &#8220;host&#8221; institution was removed.  Only competitor schools were included in the plot.</p>
<ul>
<li>652 distinct schools were included over 3 applicant terms (fall only)</li>
<li>Top 2 schools = 10.6% of all admitted students</li>
<li>Top 10 schools = 34.3%</li>
<li>Top 25 schools = 52.1%</li>
<li>Top 50 schools = 67%</li>
<li>Top 75 schools = 76%</li>
<li>Top 100 schools = 82%</li>
<li>Top 228 schools = 95%</li>
</ul>
<p>Stepping back, 72 schools account for 75% of the competition.  That&#8217;s a pretty &#8220;easy&#8221; way to define a set of schools considering that there are over 3,000 highered institutions listed in <a href="http://nces.ed.gov/ipeds/">IPEDS</a>.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.brocktibert.com/blog/2012/10/18/387/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ACT to SAT M+V Concordance Chart in R</title>
		<link>http://www.brocktibert.com/blog/2012/01/23/act-to-sat-mv-concordance-chart-in-r/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=act-to-sat-mv-concordance-chart-in-r</link>
		<comments>http://www.brocktibert.com/blog/2012/01/23/act-to-sat-mv-concordance-chart-in-r/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 01:31:46 +0000</pubDate>
		<dc:creator>btibert3</dc:creator>
				<category><![CDATA[College Admissions]]></category>
		<category><![CDATA[Higher Education]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.brocktibert.com/blog/?p=367</guid>
		<description><![CDATA[For those of who work in Enrollment Management and routinely analyze higher ed data, I wanted to share an easy way to convert ACT to equivalent SAT M+V scores in R. I am dynamically building a dataset that uses the concordance chart located here. Simply, use this data frame and merge it onto your existing [...]]]></description>
				<content:encoded><![CDATA[<p>For those of who work in Enrollment Management and routinely analyze higher ed data, I wanted to share an easy way to convert ACT to equivalent SAT M+V scores in R. I am dynamically building a dataset that uses the concordance chart located <a title="ACT to SAT M+V Concordance Chart" href="http://professionals.collegeboard.com/profdownload/act-sat-concordance-tables.pdf">here</a>. Simply, use this data frame and merge it onto your existing data (?merge) to calculate the &#8220;best standardized test score&#8221; for a given recruit, applicant, etc.</p>
<p>If you aren&#8217;t using R, give it a shot, it&#8217;s worth the effort and undoubtedly you will begin to find SAS and SPSS are too much work.</p>
<p>Let&#8217;s save the debate on the validity of standardized tests for another day&#8230;&#8230;.</p>
<script src="https://gist.github.com/4110699.js"></script>
]]></content:encoded>
			<wfw:commentRss>http://www.brocktibert.com/blog/2012/01/23/act-to-sat-mv-concordance-chart-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
