Welcome! Please contribute your ideas for what challenges we might aspire to solve, changes in our community that can improve machine learning impact, and examples of machine learning projects that have had tangible impact.
Lacking context for this site? Read the original paper: Machine Learning that Matters (ICML 2012). You can also review the slides.
Serious Scientific Discovery
  • I often find that my colleagues from other sciences are less interested in problems that ML folks would identify as classification so much as problems that look more like unsupervised learning.  That is, often the answer they want out is something descriptive -- the network of brain activity that underlies a particular mental illness, for example, or the mechanics by which a particular kind of drug resistance evolves.  That is, they're looking for something more like an explanation than a diagnosis. Typically, these things are (a) much more complicated objects than "class labels", and (b) are almost impossible to ground truth.

    (a) has been partially addressed (with mixed success) by the "structured learning", "network learning", "relational learning", et al. communities.  (Though much more research is needed.)  The really big gap I see is (b).

    There's a lot of ML research in unsupervised learning, but almost all of it uses some mathematical objective function as a proxy to either ground truth or to "what the domain expert actually cares about".  However, there's always the danger in that direction of optimizing the thing that's convenient, rather than the thing we care about.

    As a research course, then, I suggest the following topics:

    1) Ways to automatically identify and codify "what the domain expert actually cares about".  (And validate that the codification is "correct".)
    2) Strong ways to validate prediction results and ML methods in contexts where ground truth is impossible to come by.
    3) Methods for identifying complex scientific hypotheses from unsupervised/unsupervisable data, in a way that satisfies the domain experts' true (possibly implicit) criterion function.

    And as a specific challenge problem, I propose:

    - Identify and strongly validate the neural and/or genetic substrates of a mental illness (e.g., schizophrenia, bipolar, psychopathy), in a way that leads to measurably improved treatments.
  • 4 Comments sorted by
  • Fantastic suggestion.  This is hard, but so are most significant evolutionary steps.  The question posed here is along the lines of "can ML be used for 'real' science, not just predefined subproblems"?  I like it.

    I don't know if we can ever expect to "automatically identify what the domain expert cares about" -- if automatically means "purely by machine".  We might have to solve NLP first to get that. :)  But clearly we need progress on this front in some fashion.
  • I don't know if we can ever expect to "automatically identify what the domain expert cares about" -- if automatically means "purely by machine".

    Oh, yeah, I totally agree.  And I don't think I've really stated this question quite right (and certainly not formally). But, yeah, you've captured the gist of it.  I think my core gripe, underlying this suggestion, is that ML has formalized a certain set of problems (supervised learning, unsupervised learning, RL, etc.) and then focused strongly on those, primarily because they have clear-cut objective functions and measurable success scale.

    What I'm groping for here is a way to step well beyond the formalized objective functions that we have and do scientific discovery in more "real world" kinds of settings, in which goals are hard to codify and results are hard to validate.
  • The problem of serious scientific discovery seems to be related to the often ignored area of "problem posing". Unlike problem solving, problem posing seems to require a greater level of creativity than simply optimizing with respect to an objective function.

    Yoonsuck Choe and I have an article in the first issue of Brain-Mind Magazine about problem posing. You can download the article here (free but requires registration): http://www.brain-mind-magazine.org/download-article.php?file=V1-N1-ProblemPosing.pdf

    We offer some suggestions on where to start attacking this problem.
  • (This is not completely novel suggestion, of course.  Pat Langley, and more recently, Will Bridewell, have been working on real scientific discovery for a long time.  But their work has not gotten a lot of follow-up.  And even they, I think, have not focused a lot on the issue of validating the unsupervisable nor on identifying the expert's true objective.)


To post or add a comment, please sign in or register.

Tip: click the star icon to bookmark (follow) a discussion. You will receive email notifications of subsequent activity.
If search doesn't work, try putting a + in front of your search term.