CS 461: Machine Learning
Instructor: Kiri Wagstaff

CS 461 Homework 1

Due: Midnight, January 15, 2009

Part 1: Join the mailing list (10 points)

Go to this URL:

http://lists.wkiri.com/listinfo.cgi/cs461-wkiri.com

and sign up for the course mailing list. Easy!

Part 2: Machine Learning in the real world (50 points)

Where do we find Machine Learning in use outside of this class? Given what we covered in Lecture 1, you should have a good idea of how to spot Machine Learning in action. Your goal for part 1 is to go out on the web and find a

that describes a system, game, application, etc. that uses Machine Learning in some key fashion.

Next, you should compose two paragraphs in legible, polished English (you will be graded on the quality of your writing; use spell-check and proofread carefully):

  1. A summary of the machine learning component of your discovery. What kind of machine learning is being used?
  2. Your opinion, thoughts, and assessment of the system. Does it sound like it actually works, or are you skeptical? (There's a lot of hype out there!) Is it something you yourself would use, or are there drawbacks you see?

Create a text file called <yourlastname>-hw1-ml.txt (fill in your own last name) that includes:

Note: do not copy text from your online source. This is a violation of academic integrity.

Part 3: Supervised Learning (40 points)

Place your answers to these questions in a file called <yourlastname>-hw1-questions.txt:

  1. What is the difference between classification and regression?

  2. Imagine that you want to train a classifier to automatically rate restaurants, from 1 to 5 stars (5 being the best). List three numeric features you could use to represent the restaurants for the classifier.

  3. Describe a classification scenario in which false positives are much worse (more costly) than false negatives.

  4. Is k-Nearest Neighbors a parametric method or a nonparametric method? What does "parametric" mean in this context?

  5. Consider the following two-dimensional data set. The training data contains two classes of objects which are represented by "+" and "-". A test instance whose class is unknown is represented by a "?". If we apply the k-nearest neighbors algorithm with k=3, the test instance will be classified as positive. Identify all (if any) odd values of k, from 1 to 9, for which its classification would be different.

What to turn in

Upload these files to CSNS:

  1. <yourlastname>-hw1-ml.txt (or <yourlastname>-hw1-ml.pdf if you prefer to submit in PDF format)
  2. <yourlastname>-hw1-questions.txt (or .pdf)

In addition, email your response to part 1 (without the assignment header, just the URL and your two paragraphs) to the CS 461 mailing list:

  cs461@wkiri.com

Feel free to explore the links posted by other students and discuss which ones you think are most interesting.