CS代考程序代写 data science python SQL Data 100 & 200A Principles and Techniques of Data Science

Data 100 & 200A Principles and Techniques of Data Science
Spring 2019
INSTRUCTIONS
• You have 70 minutes to complete the exam.
Midterm 1 Solutions
• The exam is closed book, closed notes, closed computer, closed calculator, except one hand-written 8.5″ × 11″ crib sheet of your own creation and the official Data 100 study guide.
Last name
First name
Student ID number
CalCentral email (_@berkeley.edu)
Exam room
Name of the person to your left
Name of the person to your right
All the work on this exam is my own.

2
1. (12 points) Python vs. SQL
Fill in both the Python code and the SQL query to produce each result below, assuming that the following three tables are stored both as Pandas DataFrames and Sqlite tables. Only the first few rows are shown for each table. The cities table contains one row per city and its population in thousands. The names table contains one row per state. The states table contains one row per state with its population in millions. Assume that cities contains only a small subset of US cities. There may be multiple cities in each state, but every city is in a state that appears in both names and states, and every state contains at least one city.
cities names states
(a) (4 pt) Create a table t that is the same as cities but with an additional column ab containing the abbreviation (e.g. CA) of the state in which each city is located. Hint: The str.extract method of a Series called on a regular expression with one group returns a DataFrame with one column labeled 0 containing the first substring matching the group. Assume there is exactly one comma in each city value.
Python: t = cities.copy()
city
pop
3 55 285
abbrev
full
CA NV WA
state
people
CA NV WA
39.5 3.0 7.4
SQL:
x = cities[‘city’].str.extract(r’, (w+)’)[0]
t[‘ab’] = list(names.set_index(‘full’).loc[x, ‘abbrev’])
CREATE TABLE t AS SELECT city, pop, abbrev AS ab FROM cities JOIN names ON
city LIKE ‘%, ‘ || full;
(b) (4 pt) Create a two-column table u of the cities and their populations (labeled city and pop) that are in
states with a population above 5 million. Assume that t from part (a) was constructed correctly. Python: m = t.merge(states, left_on=’ab’, right_on=’state’)
m[m[‘people’] > 5][[‘city’, ‘pop’]]
SQL: CREATE TABLE u AS SELECT city, pop FROM t WHERE
ab IN (SELECT state FROM states WHERE people > 5)
(c) (4 pt) Create a table with one row per state that contains the state’s abbreviation and the fraction of cities (from the cities table) in that state that have a population above 50,000.
Python: t[‘pop’].groupby(t[‘ab’]).agg( lambda s: sum(s>50)/len(s))