Large-scale data exploration with R

“You are provided a selection of norms for English and German across a range of variables. The
norms rely on human judgements and/or semi-automatic extensions regarding degrees of concreteness, valence, arousal, imageability and further perception modalities. In addition, you are provided corpus-based frequency lists as well as distributional co-occurrence scores.
The goal of your project is to first analyse a subset of the norm data and then to explore whether
judgements are related across modalities and to corpus-based frequency and semantic diversity.

Task: Write a report about your findings (Your report should be 5 – 8 pages long (excluding the bibliography))”

(Unfortunately, the corpus frequency and distributional information files are too large to be uploaded here. I hope there’s another way to provide the files in case this assignment is being done by someone. It is a beginners R course, so only basic plots and statistics should be in the report, you can still freely choose the variables though. Basically have some fun looking into data!)

Digital Analytics – Individual Research Report

What you need to do:
Part I Data collection
Write a Python script that harvests the tweets of the three Twitter accounts the study focuses on. Get the contents of their tweets and when they were tweeted. Write the information into a data file (an xlsx file, not a CSV file see the documentation below). You will need to upload both the Python script and the data file that you generated as part of the assignment.

Part II Data analysis
Analyse the file that is provided, named twitterdata.xlsx. The analysis will a descriptive analysis of the tweets, in which you will compare how the three accounts in focus have tweeted (and how that possibly changed over time). This will require to draw a random sample of 50 COVID tweets per account (See the additional documentation on how to do that).
Be creative in how you handle the analysis. You will have to upload the Python script in which you perform the analysis (data cleaning if necessary and analysis/visualisation).

Part III Report
Write a research report (1,500 to 2,000 words, all included), with the following sections:
1. Introduction section in which you contextualize the research and outline the research questions.
2. Methodology section in which you concisely explain the procedure: (1) how did you get the data (Although you will work with the data that I provide, the procedure should be just the same as the one that you used, only the timeframe of data collection is wider started December 1st and lasts until April 18th), (2) what do the sample data look like (i.e., how many tweets, harvested in what period this will be a description of the file that is available on Blackboard, not the data file you harvested yourself).
3. Results section in which you discuss the analysis: i.e., what did you do with what data, and what does that tell us.
4. Discussion section in which you explain how the results answer the initial research questions (i.e., what do the results mean). This is concluded by a reflection on the strengths and weaknesses of the research methodology (draw inspiration from the introduction lecture, as well as from the module on APIs).
Make sure that the report mentions your name and student number. There are no strict guidelines on how to format the document, except for the word count. However, make it look clean and professional in every possible way. A professionally type-set research article by a publisher such as Sage, Wiley-Blackwell, Elsevier might inspire you.

In total, there are five files you need to upload, combined in a single compressed .zip file:

1. A python file that harvests tweets and writes them into a data file (.py file)
2. The data file with the harvested tweets (.xlsx file)
3. A python file with the data processing/analysis/visualisation (.py file)
4. The final version of the data file that you processed
5. A text document with the 1,500 to 2,000-word research report (.pdf)
Your project makes up 50% of your final grade.

    What are you graded on?
1. Were you able to outline the relevance of the research question? (introduction section report)
2. Is the code that you wrote to harvest tweets valid and effective? (harvest file)
3. Were you able to clearly describe the procedure on how tweets were harvested?
(methodology section report)
4. Were you able to transparently explain what you did with the data, what you
analysed/visualised? (results section report)
5. Were you able to clean and format the given research data? (analysis file)
6. Is the analysis/visualisation that you performed sound/valid? (analysis file)
7. Does your discussion of the results make sense in answering the research
questions? Are you able to pinpoint the strengths and weaknesses of the method (Including whether analysing tweets is the right way to go…)? (discussion section report)
8. Is your writing tidy and clear? (entire report)
9. Is your document professionally formatted? (entire report)

ML Python Project Report + Case Study

-Read the pdf (ML Report and Case Study Guidance and Grading Scheme).
-This is a 5000 words report that consists of two parts (sample report attached).

Part one: ML Project on Python (3000 words)
– Dataset is Telecom_customer_churn.csv
– telecomChurn.ipynb is the actual code used so far (you can work on it further please feel free to add/edit the codes  etc..)
– File “ML Project Presentation and feed back” contains the presentation that was presented to the professors and at the end has the professors feedback in the last slide.
– Sample codes also provided in the file Sample

Part two: Case Study on Spotify (2000 words):
under the pdf: MarrBernardWard_2019_26_Spotify_ArtificialIntelligence

Python Project

Just answer the given questions in the pdf.
When using Jupyter, be sure to save it as an .html file when you submit it to me.
The skeleton file attached is a Jupyter file. it must be used as a guideline for how the project should be formatted. Fill in the answers in the skeleton files in Jupyter.

Python Project (Freshman)

Just answer the given questions in the pdf.
When using Jupyter, be sure to save it as an html file when you submit it to me.
The skeleton file attached is a Jupyter file. it must be used as a guideline for how the project should be formatted. Fill in the answers in the skeleton files in Jupyter.
I cannot afford more than what I’ve offered. Please don’t bid any higher.

Case Analysis

Develop a predictive model and analysis write up to predict how many times a review was deemed helpful by other users. (Helpful votes).

Provide two insights, one actionable recommendation, and state highest model r-square value.

Tell the story of your analysis through:
exploratory data analysis
feature treatment and engineering
utilizing appropriate modeling techniques

Model will be assessed on:
R-Square value on unseen data (randomly seeded)
Processing speed (see coding requirements below)
Appropriateness for the problem at hand
Being submitted as a .py script


Designing Business Intelligence Reports:
In this assignment you will learn how to create new reports that will be used by different people in the organization. Business intelligence reports are very important communication tools in managerial decision-making and are targeted to variety of audiences that include accountants, finance professionals, marketers, salespeople, product managers, among others. The relevance, utility and timeliness of presented information are critical for effective and efficient decision-making. This exercise will provide you with a hands-on experience in understanding and building information-rich business reports.
Business Case:
You are the analyst at the business intelligence department of a retail, marketing and auditing consulting company and your new client is a Global Toys Corporation, one of the worlds largest toy manufacturers with operations across the globe. Few weeks ago, the company appointed a new Marketing Director, and in a recent presentation he announced a new strategy for some of the best-selling toy products. You are asked to lead in developing a case study (a visual story line) that will help the executive team for better and faster understanding of the presented information. In the new directors keynote, he wants to go over some facts about current business performance and then use that data to make the case for a new strategy. The director is not sure what type of data/numbers he will ultimately be using in his presentation, and therefore, he asked you to make the business report as flexible as possible in order to allow for further explorations, e.g., filtering, slicing and dicing.
Important Points to note:
As a minimum, explore the data to produce a report for senior management along the following lines.
Business locations or sites, products, sales, and customers.
Create a BI report with at least four sections but not more than six sections.
This BI report should include key facts about the companys performance on a global and regional level. These facts should include financial, marketing
related data, and or efficient use of resources.
Decide on the appropriate visualization tool/type to use based on the data you
choose, and information you intend to portray. How will this chart be perceived by a non-technical user? What questions may he/she ask and what answer(s) could she get with it?
You may consider a report along the following lines: Analyse the sites
Analyse the Products
Analyse the Sales Evolution
Analyse Customer Satisfaction
Once done, submit your report as a single PDF file to me via QMplus by uploading your work as a single file via the assignment link.
Instructions for report creation:
Navigate through the folders to access the file labelled, INSIGHT_TOY_COMPANY_2017.
High Level Marking Scheme

Dataset Project

This a project that will deal with a dataset that will need to be analyzed.  It will essentially be done from scratch.  I have provided all documentation that I have been provided along with an sample paper that be used for your own reference, and a document with a suggested dataset.  If you think that you would like to do a different data set let me know.

Also let me know in your initial bid your thoughts on the provided dataset before we proceed with assigning you to it. 

Thank you.

R Studio

1. Consider the training examples shown in Table 3.5 page 185 of the second Edition of the text book. Compute the Gini index for the overall collection of training examples. Compute the Gini index for the customer ID attribute. Compute the Gini index for the Geneder attribute. Compute the Gini index for the Car type attribute. Compute the Gini index for the Shirt Size attribute. Which attribute is better Gender, Car Type, or Shirt Size? Explain why Customer ID should not be used as the attribute test condition even though it has the lowest Gini index.
2. Repeat exercise (1) using entropy instead of the Gini index.
3. Use the outline of code we discussed in class to create a decision tree for the IrisDataSet which predicts the Type column using the other attributes. Create three versions of this tree: one using entropy, one using the Gini coecient, and one using the Classication error as splitting criteria. Use the rst half of the data set as the training data and the second half as the test data. Provide the error rate for each tree.