Assignment Detail

Tutors

Econometrics R Problems

  • Australia

Assignment Instructions

Link to Jupyter notebook: click here.

Exercise 1. In this exercise you will use US patent data from the last century to investigate whether inventors living close to other creative individuals produce more knowledge spillovers than inventors living in greater isolation.

For this problem set you will use the Stata filepset5.dta provided on Datahub and on bCourses.

Note that several of the problems require you to produce custom summary statistics and regression tables. For more information on how to produce these types of tables, see the Coding Bootcamp Part 5 notebook posted on Datahub.

The dependent variable of interest is the occurrence of patent interferences (that occurs when two inventors file a very similar patent. Let the variable inter_pair= 0 or 1, where 1 means there was a similar patent filed in a certain geographic area in the US. You will use a linear probability model to compare the probability of interference between patent pairs above and below different co-location thresholds of inventors within 10, 50 and 100 miles. You will also estimate a logit specification, interpret marginal effects, and perform hypothesis testing.

 

variable name                   variable label

_________________________________________________________________

obs_id                  unique identifier

inter_pair                           =0 if no pair patent interference, =1 if interference

cites_shared                      number of shared citations in patent

num_cl_subcl_shared    number of subclasses shared by the pair of  inventors

match1m                             =1 if co-located with places of residences within 1 mile

match50m                           =1 if co-located with places of residences within 50 miles

match100m                        =1 if co-located with places of residences within 100 miles

 

  1. What are the number of observations in the data? How many observations do we have with a pair patent interference and how many without a pair patent interference?

 

  1. Please rename the variable match1m asTreatment1. Rename the variable that indicates the fact that both individuals live within 50 miles to Treatment50. Summarize the averages and standard deviations of the percent occurrence of patent interferences and also of the number of shared citations by Treatment status, for Treatment1 and Treatment50. Do so by creating a table, Table 1,with four columns where the first two columns are for Treatment1=0 and Treatment1=1, and the next two columns for Treatment50=0 and =1. In rows 1 and 2 of summary stats, please provide the average and below it the standard deviation of inter_pair, then rows 3 and 4 the average and std dev of number of shared citations, and lastly in rows 5 and 6 the mean and std dev of the number of class and subclass shared.

 

  1. Estimate four linear probability model regressions and present the estimates in a four-column table, Table 2. Make sure you use robust standard errors always in all regressions. Let the dependent variable of all columns be the indicator of having a pair patent interference. For the regression in column 1, specify a constant (i.e. intercept) and Treatment1 as regressors. In column 2, add to the constant and Treatment1 the number of shared citations and then the number of subclasses shared. In column 3 present the estimates and standard errors from the regression of the inter_pair indicator on a constant and Treatment50, and column 4 the estimates and standard errors from the regression on a constant, Treatment50, the number of shared citations, and then the number of subclasses shared. Produce the table also by denoting with a star * the coefficients that are significant at the 10% level, two stars  ** those significant at the 5% and three starts *** those significant at the 1 percent level.(Like in lecture 19, I am asking you to run separate regressions and present the estimates in a table with 4 columns. See Coding Bootcamp Part 5 for help producing these)

 

  1. Which coefficient measures the estimated change in patent interference in areas with inventors living within one mile from each other (in column 1)? Is it statistically significant at the 5 percent level?
  2. What does the estimated constant mean? Is it significantly different from zero?
  3. Looking at the whole table, which coefficient measures the estimated change in patent interferences when the pair of individuals lives within 50 miles from each other and no other controls are considered? What is its value?

 

  1. What are the conditions needed so that we can interpret the coefficient of the Treatment variables as the causal impacts of living close by on patent interferences? What would be a simple set of tests you could run to support this? Do not run these tests – explain only what data you would use and collect and what tests you would run.

 

  1. In columns (2) and (4) we added covariates to the regression in columns (1) and (3). Does adding the covariates affect the estimated coefficient of the Treatment1? How about on Treatment50? How do you interpret the point estimate of Treatment50 now in one sentence (also using Size, Sign and Significance)?

 

  1. Looking at the change from column (3) to Column (4) in the Treatment50 estimate, can you explain what that implies in terms of the joint correlation of the Treatment50 and the pair working in the same fields (measured jointly by the effects of shared citations and number of subclasses shared)?

 

  1. What are potential problems with estimating the linear probability model?

 

  1. Run the linear probability model in column (4) of Table 2 without robust standard errors, what happened to the estimates and to the significance of the estimates? Which problem in 7 does this highlight?

 

  1. In a sentence or two, describe how the Logit model addresses the problems with the linear probability model you mentioned in Q7. Estimate the same right-hand side specification as in column 4 of Table 2 above but now using a Logit model. After you estimate the model, type the marginal effects command in R as discussed in Lecture.

What do you conclude in terms of the effect of Treatment50 on the patent interference occurrence?

 

  1. Estimate a Logit model that, together with the estimation output from 9, allows you to test whether the additional covariates added in column 4 of Table 2 relative to column 3 matter for patent interference or not. What do you conclude? Do the five steps of Hypothesis testing by hand, do not use R’s built-in test command.
  2. Create the necessary variables and then estimate a Logit model that allows you to test whether the impact of the Treatment is different depending on the number of shared citations. What do you conclude? Do the five steps of Hypothesis testing by hand, do not use R’s built-in test command

Exercise 2.Does regulating gasoline emissions, by requiring more reformulation in order to improve air quality, cause increases in gasoline prices in the USA? Cities j were regulated with stricter gasoline content standards if their measure pollution in year T, back in the 1990’s, was above X parts per million (ppm).

Cities with pollution measurements in year T that exceeded the pollution threshold X were regulated with stricter environmental laws to reformulate gasoline to have less emissions. Cities below the threshold were not bound by this regulation. You have data over time t and for a random sample of cities j in the USA.

  1. How would you estimate the causal effect of regulation on gasoline prices? Write down the exact regression and define each variable. Also say which coefficient is interpreted as the causal effect of the regulation.
  2. What assumption is key for you to interpret the coefficient as a causal effect of regulation aimed at cleaning air quality on the prices consumers pay for gasoline?

 

 

Questions continue on the next page

 

 

 

Exercise 3

Predicted Outcomejt= 50 – 30 Afterjt + 25 b2Treatedjt  –  20  Afterjt*Treatedjt

Using the above estimated equation, please fill out the table below:

Average OutcomeControl, Treated=0Treated=1Difference T-C in each row
Before, After=0   
After=1   
Difference After-Before in each Column  D in D=

 

 

Need fresh solution to this Assignment without plagiarism?? Get Quote Now

Expert Answer

Asked by: Anonymous
Plagiarism Checked
Answer Rating:
4.6/5

Plagiarism free Answer files are strictly restricted for download to the student who originally posted this question.

Attachments

  • Pset5.docx
    File size: 26.38 KB

Related Assignments

//
Our customer support team is here to answer your questions. You can send Assignments directly to support team.
👋 Hi, how can I help?