CSCI 4144/6405 – Data Mining and Data Warehousing
Assignment 3: Iceberg Cube Computation
1. Assignment Overview
Copyright By cscodehelp代写 加微信 cscodehelp
In this assignment, you need to write a program to implement BUC – an efficient algorithm for iceberg cube computation. The major objective of this assignment is to get yourself familiar with efficient cube computation.
2. Important Note
There is a zero-tolerance policy on academic offenses such as plagiarism or inappropriate collaboration. By submitting your solution for this assignment, you acknowledge that the code submitted is your own work. You also agree that your code may be submitted to a plagiarism detection software (such as MOSS) that may have servers located outside Canada unless you have notified me otherwise, in writing, before the submission deadline. Any suspected act of plagiarism will be reported to the Faculty’s Academic Integrity Officer in accordance with Dalhousie University’s regulations regarding Academic Integrity. Please note that:
1) The assignments are individual assignments. You can discuss the problems with your friends/classmates, but you need to write your program by yourself. There should not be much similarity in terms of coding.
2) When you refer to some online resources to complete your program, you need to understand the mechanism, then write your own code. In addition, you should cite the sources via comments in your program.
3. Detailed Requirements
1) BUC Overview: BUC is an algorithm for the computation of iceberg cubes. The details of this algorithm can be found in the lecture notes and Section 5.2.2 of the textbook.
2) Sample Data Set: A sample data set file titled “Product_Sales_Data_Set.csv” is used as the data set for this assignment. Specifically:
a) CSV stands for Comma-Separated Values. A CSV file is a text file that uses a comma to separate values. Often, the first record in a CSV file is a header line including a list of field names. Therefore, it is very easy to dig into a CSV file and look for useful information. You can use any text editor to open a CSV file and view its content. More details about CSV can be found here: https://en.wikipedia.org/wiki/Comma-separated_values
b) The sample data set is available via brightspace. There are 128 records (not including the header line) in the data set. The data set includes 5 fields: Item, Location, Year, Supplier, Sales_Units.
c) In this data warehouse, there are 4 dimensions (i.e. Item, Location, Year, and Supplier) and 1 measure (i.e. Sales_Units). The valid values for each of the dimensions are listed below:
a. Item: Computer, Camera, Phone, Printer
b. Location: Toronto, Vancouver, , Chicago
c. Year: 2017, 2018
d. Supplier: Samsung, Sony, HP, Dell
3) Required Program: You need to write a program that generates the iceberg cube using BUC. Here are the detailed requirements:
a) You should place “Product_Sales_Data_Set.csv” in the directory where your program file is located.
b) The name of your program should be “BUC”. After BUC is executed via the command- based interface, BUC should prompt the user to enter the minimum support (it is denoted as “min_sup” in ASN 3) for the iceberg cube to be generated. Note that the iceberg condition in this assignment is “Sales_Units >= min_sup” (note the condition is not “the number of tuples >=min_sup”).
c) With the provided min_sup, your program will read “Product_Sales_Data_Set.csv” and generate the corresponding iceberg cube using BUC. For testing purposes, only the results corresponding to the following cuboids should be saved in a file named “Iceberg-Cube- Results.txt”. In detail, the results corresponding to the cuboid (Item), which should be presented using a table structure, are placed at the beginning of “Iceberg-Cube- Results.txt”; these results are followed by the results corresponding to the cuboid (Item, Location) and thereafter those corresponding to (Item, Location, year). Note that P137 of the textbook includes two example tables: Table 4.2 and 4.3. In total, there should be 3 tables in “Iceberg-Cube-Results.txt” (corresponding to the following cuboids). The format of the tables should be similar to that of the example tables. Appendix 2 at the end of this document includes a sample table for (Item, Location, year), which is used to illustrate the required format.
b. (Item, Location)
c. (Item, Location, Year)
d) Note that:
a. (Item) is a 1D cuboid. The table for this cuboid should include two columns. The first column includes a list of Item values. The second column includes the corresponding Sales_Units values.
b. (Item, Location) is a 2D cuboid. Each row in the table for this cuboid should include the Sales-Units values of a specific item (e.g. Computer) for different locations (i.e. Toronto, Vancouver, , Chicago).
c. (Item, Location, Year) is a 3D cuboid, the table for this cuboid should include two 2D components: one for the year of 2017 and the other for the year of 2018. Each row in a 2D component should include the Sales-Units values of a specific item for different locations. These two 2D components can be displayed in a horizontal or vertical manner. Note that Table 4.3 on P137 of the textbook corresponds to the horizontal manner.
d. The order of the data in the tables should be consistent with the order of the data in “Product_Sales_Data_Set.csv”. For example, in “Product_Sales_Data_Set.csv”, the tuples involving Computer are in front of the tuples involving Camera. Consequently, in the table for the cuboid (Item), the tuple involving Computer should be in front of the tuple involving Camera.
e. In the table for a cuboid, there might be some empty entries. In this case, please keep the table structure and leave the corresponding table entries empty.
f. “Iceberg-Cube-Results.txt” should be placed in the directory where your program file is located.
g. You can probably generate the correct results without recursion. However, BUC is a recursive algorithm, an appropriate implementation of BUC should include recursion.
4) Testing Platform and Required Language: The details of the testing platform and the required programming language are presented as follows.
a) Testing Data Set: The sample data set is used to test your program. In addition, the Sales_Units values in the sample data set could be changed to test the robustness of your program. Note that, in this assignment, each Sales-Units value is always a 1-digit integer.
b) Testing Server: “timberlea.cs.dal.ca” is the computer used by the TA to evaluate your program. Therefore, you need to make sure that your program works on timberlea.
a. You can use your CS ID to log on to “timberlea.cs.dal.ca” in order to write your program. Alternatively, you can write your program on other machines, then transfer your program to timeberlea and thereafter test it on timberlea.
b. If you do not know your CS ID, you can visit the following webpage to get your CS ID. If your CS ID does not work or you have a question about your CS ID, please send an email to
c) Required Programming Language: You need to use Java or Python as the programming language because timberlea supports these languages. Note that both Python 2 and Python 3 are available on timberlea.cs.dal.ca. You can use “python2 –version” and
“python3 –version” to check the specific versions on timberlea. In addition, you can only use the following header file or libraries in your program:
a. Java: java.io.*, java.util.*, java.lang.Math
b. Python: csv, math, itertools
d) Compiling and running your program on timberlea.cs.dal.ca should not lead to errors or warnings. To compile and run your program on timberlea, you need to be able to access the command-line interface of timerlea. In addition, you need to be able to upload a file to or download a file from timberlea.
a. To access command-line interface of timerlea, you can use the software tool “putty” on MS Windows computers. “putty” can be downloaded here: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html . On Mac and Linux computers, you can use the command “ssh” to access timberlea via the program called “Terminal”.
b. To transfer files between your computer and timberlea, several different methods could be used. Here are two methods for MS Windows and macOS/Linux computers.
i. MS Windows Computer: WinSCP is popular tool used to transfer files between two computers. You can download WinSCP from the following webpage: https://winscp.net/eng/download.php . The documentation for WinSCP can be found here: https://winscp.net/eng/docs/start. Specifically, you can focus on the “Uploading Files” and “Downloading Files” section of this document to understand how to transfer files.
ii. Mac and Linux Computer: On Mac and Linux computers, you can use the command “scp” to transfer files. Here is a tutorial on the command “scp”: https://www.linuxtechi.com/scp-command-examples-in-linux/.
5) Readme File: You need to complete a readme file named “Readme.txt”, which includes the instructions that the TA could use to compile and execute your program on timberlea.
6) Submission: Please pay attention to the following submission requirements:
a) You should place “Readme.txt” in the directory where your program file is located.
b) You should place “Product_Sales_Data_Set.csv” in the directory where your program file
c) Your program file, “Readme.txt”, and “Product_Sales_Data_Set.csv” should be
compressed into a zip file named “ASN3-YourFirstName-YourLastName.zip”. For example, my zip file should be called “ASN3-Qiang-Ye.zip”. Finally, you need to submit your zip file for this assignment via brightspace.
a. Note that there is an appendix at the end of this document, which includes the commands that you can use to compress your files on timberlea.
4. Grading Criteria
The marker will use your submitted zip file to evaluate your assignment. The full grade is 16 points. The details of the grading criteria are presented as follows.
1) Does “Readme.txt” include enough information so that the TA can easily compile and execute the program on timberlea? (1 Point)
2) User Input: After BUC is executed via the command-based interface, BUC should prompt the user to enter the minimum support. (1 Point)
3) Iceberg Cube Results: With the provided min_sup, your program should generate the corresponding iceberg cube using BUC. For testing purposes, the results corresponding to the following cuboids should be saved in a file named “Iceberg-Cube-Results.txt”. The TA will use 3 test cases to evaluate your program. Namely, the TA will execute your program 3 times. These executions are independent. During each execution, the TA provides one unique min_sup, and your program should generate one “Iceberg-CubeResults.txt”.
(9 Points: Each test case is worth 3 points) a) (Item)
b) (Item, Location)
c) (Item, Location, Year)
4) BUC is a recursive algorithm, an appropriate implementation of BUC should include recursion. (2 Points)
5) Overall Quality of the Program (i.e. whether the structure of the program is clear and reasonable, whether the program is properly commented, whether the indentation is appropriate, etc). (3 Points)
5. Academic Integrity
At Dalhousie University, we respect the values of academic integrity: honesty, trust, fairness, responsibility and respect. As a student, adherence to the values of academic integrity and related policies is a requirement of being part of the academic community at Dalhousie University.
1) What does academic integrity mean?
Academic integrity means being honest in the fulfillment of your academic responsibilities thus establishing mutual trust. Fairness is essential to the interactions of the academic community and is achieved through respect for the opinions and ideas of others. Violations of intellectual honesty are offensive to the entire academic community, not just to the individual faculty member and students in whose class an offence occur (See Intellectual Honesty section of University Calendar).
2) How can you achieve academic integrity?
– Make sure you understand Dalhousie’s policies on academic integrity. 5
– Give appropriate credit to the sources used in your assignment such as written or oral work, computer codes/programs, artistic or architectural works, scientific projects, performances, web page designs, graphical representations, diagrams, videos, and images. Use RefWorks to keep track of your research and edit and format bibliographies in the citation style required by the instructor. (See http://www.library.dal.ca/How/RefWorks)
– Do not download the work of another from the Internet and submit it as your own.
– Do not submit work that has been completed through collaboration or previously submitted for another assignment without permission from your instructor.
– Do not write an examination or test for someone else.
– Do not falsify data or lab results.
These examples should be considered only as a guide and not an exhaustive list.
3) What will happen if an allegation of an academic offence is made against you?
I am required to report a suspected offence. The full process is outlined in the Discipline flow chart, which can be found at: http://academicintegrity.dal.ca/Files/AcademicDisciplineProcess.pdf and includes the following: a. Each Faculty has an Academic Integrity Officer (AIO) who receives allegations from instructors. b. The AIO decides whether to proceed with the allegation and you will be notified of the process. c. If the case proceeds, you will receive an INC (incomplete) grade until the matter is resolved. d. If you are found guilty of an academic offence, a penalty will be assigned ranging from a warning to a suspension or expulsion from the University and can include a notation on your transcript, failure of the assignment or failure of the course. All penalties are academic in nature.
4) Where can you turn for help?
– If you are ever unsure about ANYTHING, contact myself.
– The Academic Integrity website (http://academicintegrity.dal.ca) has links to policies, definitions, online tutorials, tips on citing and paraphrasing.
– The Writing Center provides assistance with proofreading, writing styles, citations.
– Dalhousie Libraries have workshops, online tutorials, citation guides, Assignment Calculator, RefWorks, etc.
– The Dalhousie Student Advocacy Service assists students with academic appeals and student discipline procedures.
– The Senate Office provides links to a list of Academic Integrity Officers, discipline flow chart, and Senate Discipline Committee.
Computer xxx Camera xxx Phone xxx Printer xxx
xxx xxx xxx
xxx xxx xxx xxx
Computer xxx Camera xxx Phone
xxx xxx xxx
Appendix 1: How to Use Zip and Unzip on Timberlea
Appendix 2: Sample Output for (Item, Location, Year)
1) “xxx” in the sample output represents an aggregated value.
2) As mentioned previously, in the table for a cuboid in an iceberg cube, there might be some empty entries. In this case, please keep the table structure and leave the corresponding table entries empty.
程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: email@example.com