CS代考计算机代写 data mining chain Never-Ending Language Learning

Never-Ending Language Learning
Tom Mitchell, William Cohen, and Many Collaborators Carnegie Mellon University

We will never really understand learning until we build machines that
• learnmanydifferentthings,
• fromyearsofdiverseexperience,
• inastaged,curricularfashion,
• andbecomebetterlearnersovertime.

Tenet 2:
Natural language understanding requires a belief system
A natural language understanding system should react to text by saying either:
• I understand, and already knew that
• I understand, and didn’t know, but accept it
• I understand, and disagree because …

NELL: Never-Ending Language Learner
Inputs:
• initialontology(categoriesandrelations)
• dozenexamplesofeachontologypredicate • theweb
• occasionalinteractionwithhumantrainers
The task:
• run24x7,forever
• eachday:
1. extract more facts from the web to populate the ontology
2. learn to read (perform #1) better than yesterday

NELL today
Running 24×7, since January, 12, 2010
Result:
• knowledge base with 90 million candidate beliefs • learning to read
• learning to reason
• extending ontology

NELL knowledge fragment
uses equipment
football
climbing
* including only correct beliefs
skates helmet
Canada Miller
country hospital
Sunnybrook
uses equipment
hockey
Wilson
hired
Maple Leafs
city company
politician
radio
CFRB
Detroit
hometown
GM
competes with
Toyota
created
Pearson
Toronto
play
airport
city company
home town city
won league
Stanley Cup
NHL
Red Wings
stadium
team stadium
won
league acquired
Connaught city city paper
stadium
Air Canada Centre
member
Hino
economic
plays in
Globe and Mail
sector
Skydome
writer
Milson
Sundin Toskala
automobile
Prius Corrola

NELL Is Improving Over Time (Jan 2010 to Nov 2014)
all beliefs high conf. beliefs
precision@10
mean avg. precision top 1000
number of NELL beliefs vs. time
reading accuracy vs. time
(average over 31 predicates)
human feedback vs. time
(average 2.4 feedbacks per predicate per month)
10’s of millions
millions

NELL Today
• eg. “diabetes”, “Avandia”, “tea”, “IBM”, “love” “baseball” “San Juan” “BacteriaCausesCondition” “kitchenItem” “ClothingGoesWithClothing” …

Portuguese NELL
[Estevam Hruschka, 2014]

How does NELL work?

Semi-Supervised Bootstrap Learning
it’s underconstrained!!
Learn which noun phrases are cities:
Paris Pittsburgh Seattle Montpelier
San Francisco Berlin
denial
anxiety selfishness London
mayor of arg1 live in arg1
arg1 is home of traits such as arg1

Key Idea 1: Coupled semi-supervised training of many functions
person
noun phrase
hard
(underconstrained) semi-supervised learning problem
much easier (more constrained) semi-supervised learning problem

Type 1 Coupling: Co-Training, Multi-View Learning
person
Supervised training of 1 function: Minimize:
NP:

Type 1 Coupling: Co-Training, Multi-View Learning
person
Coupled training of 2 functions: Minimize:
NP:

Type 1 Coupling: Co-Training, Multi-View Learning
NP:
person
[Blum & Mitchell; 98] [Dasgupta et al; 01 ] [Ganchev et al., 08] [Sridharan & Kakade, 08] [Wang & Zhou, ICML10]

NELL: Learned reading strategies
Mountain:
“volcanic crater of _” “volcanic eruptions like _” “volcanic peak of _” “volcanic region of _” “volcano , called _” “volcano called _” “volcano is called _” “volcano known as _” “volcano Mt _” “volcano named _” “volcanoes , including _” “volcanoes , like _” “volcanoes , such as _” “volcanoes include _” “volcanoes including _” “volcanoes such as _” “We ‘ve climbed _” “weather atop _” “weather station atop _” “week hiking in _” “weekend trip through _” “West face of _” “West ridge of _” “west to beyond _” “white ledge in _” “white summit of _” “whole earth , is _” “wilderness area surrounding _” “wilderness areas around _” “wind rent _” “winter ascent of _” “winter ascents in _” “winter ascents of _” “winter expedition to _” “wooded foothills of _” “world famous view of _” “world famous views of _” “you ‘re popping by _” “you ‘ve just climbed _” “you just climbed _” “you’ve climbed _” “_ ‘ crater” “_ ‘ eruption” “_ ‘ foothills” “_ ‘ glaciers” “_ ‘ new dome” “_ ‘s Base Camp” “_ ‘s drug guide” “_ ‘s east rift zone” “_ ‘s main summit” “_ ‘s North Face” “_ ‘s North Peak” “_ ‘s North Ridge” “_ ‘s northern slopes” “_ ‘s southeast ridge” “_ ‘s summit caldera” “_ ‘s West Face” “_ ‘s West Ridge” “_ ‘s west ridge” “_ (D,DDD ft” ” “_ climbing permits” “_ climbing safari” “_ consult el diablo” “_ cooking planks” “_ dominates the sky line” “_ dominates the western skyline” “_ dominating the scenery”

Type 1 Coupling: Co-Training, Multi-View Learning
NP:
person
[Blum & Mitchell; 98] [Dasgupta et al; 01 ] [Ganchev et al., 08] [Sridharan & Kakade, 08] [Wang & Zhou, ICML10]

Multi-view, Multi-Task Coupling
athlete
[Blum & Mitchell; 98]
[Dasgupta et al; 01 ] person [Ganchev et al., 08]
sport [Sridharan & Kakade, 08] coach [Wang & Zhou, ICML10]
team
NP NP HTML morphology contexts
[Taskar et al., 2009] [Carlson et al., 2009]
NP:
distribution
NP text
context
athlete(NP) à person(NP)
athlete(NP)àNOT sport(NP) NOT athlete(NP)ßsport(NP)

Type 3 Coupling: Relation Argument Types
playsSport(a,s)
playsForTeam(a,t)
NP1
teamPlaysSport(t,s)
coachesTeam(c,t)
NP2

Type 3 Coupling: Relation Argument Types
playsSport(NP1,NP2)àathlete(NP1), sport(NP2) playsSport(a,s)
playsForTeam(a,t)
teamPlaysSport(t,s)
coachesTeam(c,t)
athlete
athlete
person
sport
coach team
person
sport
over 2500 coupled functions in NELL
coach
team
NP1
NP2

Pure EM Approach to Coupled Training
Scaling problem:
• E step: 25M NP’s, 1014 NP pairs to label
• Mstep:50Mtextcontextstoconsiderforeachfunctionà 1010 parameters to retrain
• evenmoreURL-HTMLcontexts…
E: estimate labels for each function of each unlabeled example
M: retrain all functions, using these probabilistic labels

NELL’s Approximation to EM E’ step:
• Re-estimatetheknowledgebase:
– but consider only a growing subset of the latent variable
assignments
– category variables: up to 250 new NP’s per category per iteration
– relation variables: add only if confident and args of correct type
– this set of explicit latent assignments *IS* the knowledge base
M’ step:
• Eachview-basedlearnerretrainsitselffromtheupdatedKB • “context”methodscreategrowingsubsetsofcontexts

Initial NELL Architecture
Knowledge Base (latent variables)
Beliefs
Knowledge Integrator
Candidate Beliefs
Text Context patterns (CPL)
HTML-URL context patterns (SEAL)
Continually Learning Reading Components
Morphology classifier
(CML)
Human advice

If coupled learning is the key,
how can we get new coupling constraints?

Key Idea 2:
Discover New Coupling Constraints
• learn horn clause rules/constraints:
– learned by data mining the knowledge base
– connect previously uncoupled relation predicates – infer new unread beliefs
– modified version of FOIL [Quinlan]
0.93 athletePlaysSport(?x,?y)ßathletePlaysForTeam(?x,?z) teamPlaysSport(?z,?y)

Learned Probabilistic Horn Clause Rules
0.93 playsSport(?x,?y)ßplaysForTeam(?x,?z), teamPlaysSport(?z,?y)
playsSport(a,s) playsForTeam(a,t)
person
teamPlaysSport(t,s)
coachesTeam(c,t)
athlete
sport
coach team
person
sport
athlete
team
coach
NP1
NP2

Infer New Beliefs
[Lao, Mitchell, Cohen, EMNLP 2011]
economic sector
economic x3 sector
(x2, x3)
If:
Then:
x1
competes x2 with
(x1,x2)
economic sector (x1, x3)

Inference by Random Walks
PRA: [Lao, Mitchell, Cohen, EMNLP 2011]
economic sector
PRA:
1. restrict precondition to a chain.
2. inference by random walks
x1 competes x2 with
(x1,x2)
economic x3 sector
(x2, x3)
If:
Then:
economic sector (x1, x3)

Inference by KB Random Walks
[Lao, Mitchell, Cohen, EMNLP 2011]
KB:
Random walk path type:
Pr( R(x,y) ):
x competes ? with
economic y sector
logistic function for R(x,y)
where ith feature = probability of arriving at node y starting at node x, and taking a random walk along path of type i

CityLocatedInCountry(Pittsburgh) = ?
[Lao, Mitchell, Cohen, EMNLP 2011]
Pittsburgh
Feature = Typed Path
CityInState, CityInstate-1, CityLocatedInCountry
Feature Value
Logistic Regresssion Weight
0.32

CityLocatedInCountry(Pittsburgh) = ?
[Lao, Mitchell, Cohen, EMNLP 2011]
Pennsylvania
Pittsburgh
Feature = Typed Path
CityInState, CityInstate-1, CityLocatedInCountry
Feature Value
Logistic Regresssion Weight
0.32

CityLocatedInCountry(Pittsburgh) = ?
[Lao, Mitchell, Cohen, EMNLP 2011]
Pennsylvania
Pittsburgh
…(14) Philadelphia
Harisburg
Feature = Typed Path
Feature Value
Logistic Regresssion Weight
0.32
CityInState, CityInstate-1, CityLocatedInCountry

CityLocatedInCountry(Pittsburgh) = ?
Pennsylvania
[Lao, Mitchell, Cohen, EMNLP 2011] U.S.
Pittsburgh
…(14) Philadelphia
Harisburg
Feature = Typed Path
Feature Value
Logistic Regresssion Weight 0.32
CityInState, CityInstate-1, CityLocatedInCountry

CityLocatedInCountry(Pittsburgh) = ?
Pennsylvania
[Lao, Mitchell, Cohen, EMNLP 2011] U.S.
Pittsburgh
…(14) Philadelphia
Harisburg
Pr(U.S. | Pittsburgh, TypedPath)
Logistic Regresssion Weight
Feature = Typed Path Feature Value
CityInState, CityInstate-1, CityLocatedInCountry 0.8 0.32

CityLocatedInCountry(Pittsburgh) = ?
Pennsylvania
[Lao, Mitchell, Cohen, EMNLP 2011] U.S.
Pittsburgh
…(14) Philadelphia
Harisburg
Feature = Typed Path
Feature Value
Logistic Regresssion Weight
CityInState, CityInstate-1, CityLocatedInCountry AtLocation-1, AtLocation, CityLocatedInCountry
0.8 0.32 0.20

CityLocatedInCountry(Pittsburgh) = ?
Pennsylvania
[Lao, Mitchell, Cohen, EMNLP 2011] U.S.
…(14) Philadelphia
PPG Delta
Feature = Typed Path
Pittsburgh
Harisburg
Feature Value
Logistic Regresssion Weight
CityInState, CityInstate-1, CityLocatedInCountry AtLocation-1, AtLocation, CityLocatedInCountry
0.8 0.32 0.20

CityLocatedInCountry(Pittsburgh) = ?
Pennsylvania
[Lao, Mitchell, Cohen, EMNLP 2011] U.S.
…(14) Philadelphia
PPG Delta
Feature = Typed Path
Pittsburgh
Harisburg
Atlanta
Dallas
Tokyo
Feature Value
Logistic Regresssion Weight
CityInState, CityInstate-1, CityLocatedInCountry AtLocation-1, AtLocation, CityLocatedInCountry
0.8 0.32 0.20
AtLocation

CityLocatedInCountry(Pittsburgh) = ?
[Lao, Mitchell, Cohen, EMNLP 2011]
U.S.
Pennsylvania
Japan
…(14) Philadelphia
PPG Delta
Feature = Typed Path
Pittsburgh
Harisburg
Atlanta
Dallas
Tokyo
Feature Value
Logistic Regresssion Weight
CityInState, CityInstate-1, CityLocatedInCountry AtLocation-1, AtLocation, CityLocatedInCountry
0.8 0.32 0.6 0.20
CityLocatedInCountry
AtLocation

CityLocatedInCountry(Pittsburgh) = ?
[Lao, Mitchell, Cohen, EMNLP 2011]
U.S.
Pennsylvania
Japan
…(14) Philadelphia
PPG Delta
Feature = Typed Path
Pittsburgh
Harisburg
Atlanta
Dallas
Tokyo
Feature Value
Logistic Regresssion Weight
CityInState, CityInstate-1, CityLocatedInCountry AtLocation-1, AtLocation, CityLocatedInCountry
0.8 0.32
0.6 0.20 ………
CityLocatedInCountry(Pittsburgh) = U.S. p=0.58
CityLocatedInCountry
AtLocation

CityLocatedInCountry(Pittsburgh) = ?
[Lao, Mitchell, Cohen, EMNLP 2011]
Pennsylvania
U.S.
Japan
…(14) Philadelphia
PPG Delta
Feature = Typed Path
CityInState, CityInstate-1, CityLocatedInCountry AtLocation-1, AtLocation, CityLocatedInCountry
Pittsburgh
1. Tractable (bounded length)
2. Anytime Harisburg
3. Accuracy increases as
Atlanta
4. combines probabilities from different horn clauses
KB grows
Dallas
Tokyo
Feature Value
Logistic Regresssion Weight
0.6 0.20 ………
0.8 0.32
CityLocatedInCountry(Pittsburgh) = U.S. p=0.58
CityLocatedInCountry
AtLocation

Random walk inference: learned rules
CityLocatedInCountry(city, country):
8.04 cityliesonriver, cityliesonriver-1, citylocatedincountry
5.42 hasofficeincity-1, hasofficeincity, citylocatedincountry
4.98 cityalsoknownas, cityalsoknownas, citylocatedincountry
2.85 citycapitalofcountry,citylocatedincountry-1,citylocatedincountry 2.29 agentactsinlocation-1, agentactsinlocation, citylocatedincountr 1.22 statehascapital-1, statelocatedincountry
0.66 citycapitalofcountry .
7 of the 2985 learned rules for CityLocatedInCountry
y

Opportunity:
Can infer more if we start with more densely connected knowledge graph
à as NELL learns, it will become more dense à augment knowledge graph with a second
graph of corpus statistics: triples
[Gardner et al, 2014]

[Gardner et al, 2014] NELL: concepts and“nounphrases”
c:penguins
“Penguins” “Pens”
hometown
c:pittsburgh
“Pittsburgh” “Pgh”
river flows through
c:monongahela
“Monongahela” “Mon river”
can refer to
can refer to
can refer to

[Gardner et al, 2014] NELL: concepts and“nounphrases”
team:penguins
“Penguins” “Pens”
hometown
city:pittsburgh
“Pittsburgh” “Pgh”
river flows through
river:monongahela
“Monongahela” “Mon river”
“sits astride”
“overlooks”
“enters”
“runs through”
“remain in”
“began in”
“supports”
“reminded”
SVO triples from 500 M dependency parsed web pages (thank you Chris Re!)
can refer to
can refer to
can refer to

[Gardner et al, 2014]
NELL: concepts and“nounphrases”
c:penguins
“Penguins” “Pens”
hometown
river flows through
– – –

Circumvents NELL’s fixed vocabulary of relations!
Sadly, adding these does not help: too sparse
c:pittsburgh c:monongahela
“remain in”
But clustering verb phrases based on latent embedding (NNMF), produces significant improvement
– {“lies on”, “runs through”, “flows through”, …}
“sits astride” Precision/recall over 15 NELL relations:
“began in”
“supports”
“reminded”
“Pittsburgh”
KB only: 0.80 / 0.33
“Monongahela” “Mon river”
“Pgh”
“overlooks” KB + SVOlatent: 0.87 / 0.42
“enters”
[Gardner et al., 2014]
SVO triples from 500 M dependency parsed web pages (thank you Chris Re!)
“runs through”
can refer to
can refer to
can refer to

Key Idea 3:
Automatically extend ontology

Ontology Extension (1) [Mohamed et al., EMNLP 2011] Goal:
• Add new relations to ontology
Approach:
• For each pair of categories C1, C2,
• cluster pairs of known instances, in terms of text contexts that connect them

Example Discovered Relations
[Mohamed et al. EMNLP 2011]
Category Pair Frequent Instance Pairs Text Contexts
Suggested Name
MusicInstrument Musician
sitar, George Harrison tenor sax, Stan Getz trombone, Tommy Dorsey vibes, Lionel Hampton
ARG1 master ARG2 ARG1 virtuoso ARG2 ARG1 legend ARG2 ARG2 plays ARG1
Master
Disease Disease
pinched nerve, herniated disk tennis elbow, tendonitis blepharospasm, dystonia
ARG1 is due to ARG2 ARG1 is caused by ARG2
IsDueTo
CellType Chemical
epithelial cells, surfactant neurons, serotonin mast cells, histomine
ARG1 that release ARG2 ARG2 releasing ARG1
ThatRelease
Mammals Plant
koala bears, eucalyptus sheep, grasses goats, saplings
ARG1 eat ARG2 ARG2 eating ARG1
Eat
River City
Seine, Paris Nile, Cairo Tiber river, Rome
ARG1 in heart of ARG2 ARG1 which flows through ARG2
InHeartOf

NELL: sample of self-added relations
• athleteWonAward
• animalEatsFood
• languageTaughtInCity
• clothingMadeFromPlant
• beverageServedWithFood
• fishServedWithFood
• athleteBeatAthlete
• athleteInjuredBodyPart
• arthropodFeedsOnInsect
• animalEatsVegetable
• plantRepresentsEmotion
• foodDecreasesRiskOfDisease
• clothingGoesWithClothing
• bacteriaCausesPhysCondition
• buildingMadeOfMaterial
• emotionAssociatedWithDisease • foodCanCauseDisease
• agriculturalProductAttractsInsect • arteryArisesFromArtery
• countryHasSportsFans
• bakedGoodServedWithBeverage • beverageContainsProtein
• animalCanDevelopDisease
• beverageMadeFromBeverage

Ontology Extension (2) [Burr Settles] Goal:
• Add new subcategories
Approach:
• For each category C,
• train NELL to read the relation SubsetOfC: CàC
*no new software here, just add this relation to ontology

NELL: subcategories discovered by reading Animal:
• Pets
– Hamsters, Ferrets, Birds, Dog, Cats,
Rabbits, Snakes, Parrots, Kittens, …
• Predators
– Bears, Foxes, Wolves, Coyotes, Snakes, Racoons, Eagles, Lions, Leopards, Hawks, Humans, …
Learned reading patterns for AnimalSubset(arg1,arg2) “arg1 and other medium sized arg2”
“arg1 and other jungle arg2” “arg1 and
other magnificent arg2” “arg1 and other
pesky arg2” “arg1 and other mammals and arg2” “arg1 and other Ice Age arg2” “arg1 or other biting arg2” “arg1 and other marsh arg2” “arg1 and other migrant arg2” “arg1 and other monogastric arg2” “arg1 and other mythical arg2” “arg1 and other nesting
arg2” “arg1 and other night arg2” “arg1 and other nocturnal arg2” “arg1 and

NELL: subcategories discovered by reading
Animal:
• Pets
– Hamsters, Ferrets, Birds, Dog, Cats,
Rabbits, Snakes, Parrots, Kittens, …
• Predators
– Bears, Foxes, Wolves, Coyotes, Snakes, Racoons, Eagles, Lions, Leopards, Hawks, Humans, …
Learned reading patterns:
“arg1 and other medium sized arg2” “arg1 and other jungle arg2” “arg1 and other magnificent arg2” “arg1 and other pesky arg2” “arg1 and other mammals and arg2” “arg1 and other Ice Age arg2” “arg1 or other biting arg2” “arg1 and other marsh arg2” “arg1 and other migrant arg2” “arg1 and other monogastric arg2” “arg1 and other mythical arg2” “arg1 and other nesting
Chemical:

• Gases
Fossil fuels
– Carbon, Natural gas, Coal, Diesel, Monoxide, Gases, …
– Helium, Carbon dioxide, Methane, Oxygen, Propane, Ozone, Radon…
Learned reading patterns:
“arg1 and other hydrocarbon arg2” “arg1 and other aqueous arg2” “arg1 and other hazardous air arg2” “arg1 and oxygen are arg2” “arg1 and such synthetic arg2” “arg1 as a lifting arg2” “arg1 as a tracer arg2” “arg1 as the carrier arg2” “arg1 as the inert arg2” “arg1 as the primary cleaning arg2” “arg1 and other noxious arg2” “arg1 and other trace arg2” “arg1 as the reagent arg2” “arg1 as the tracer
arg2” “arg1 and other night arg2” “arg1 arg2”
and other nocturnal arg2” “arg1 and

NELL Architecture
Knowledge Base (latent variables)
Beliefs
Evidence Integrator
Text Context patterns (CPL)
Candidate Beliefs
Orthographic classifier
(CML)
URL specific HTML patterns (SEAL)
Human advice
Actively search for web text (OpenEval)
Infer new beliefs from old (PRA)
Image classifier
(NEIL)
Ontology extender
(OntExt)

Key Idea 4: Cumulative, Staged Learning
Learning X improves ability to learn Y
1. Classify noun phrases (NP’s) by category
2. Classify NP pairs by relation
3. Discover rules to predict new relation instances
4. Learn which NP’s (co)refer to which latent concepts
5. Discover new relations to extend ontology
6. Learn to infer relation instances via targeted random walks
7. Vision: connect NELL and NEIL
8. Learn to microread single sentences
9. Learn to assign temporal scope to beliefs
10. Goal-driven reading: predict, then read to corroborate/correct
11. Make NELL a conversational agent on Twitter 12. Add a robot body to NELL
NELL is here

Consistency Correctness Self reflection

The core problem:
• Agentscanmeasureinternalconsistency,
but not correctness
Challenge:
• Under what conditions does consistencyàcorrectness?

The core problem:
• Agentscanmeasureinternalconsistency,
but not correctness
Challenge:
• Under what conditions does consistencyàcorrectness?
• Cananautonomousagentdetermineitsaccuracyfrom observed consistency?

[Platanios, Blum, Mitchell, UAI 2014] • haveNdifferentestimates oftargetfunction
Problem setting:
• agreementbetweenfi,fj :

[Platanios, Blum, Mitchell, UAI 2014] • haveNdifferentestimates oftargetfunction
Problem setting:
• agreementbetweenfi,fj :
Key insight: errors and agreement rates are related
Pr[neither makes error] + Pr[both make error]
prob. fi and fi agree
prob. fi error
prob. fj error
prob. fi and fj both make error

Estimating Error from Unlabeled Data
1. IF f1 , f2 , f3 make indep. errors, and accuracies > 0.5 THEN
à
Measure errors from unlabeled data:
– use unlabeled data to estimate a12, a13, a23
– solve three equations for three unknowns e1, e2, e3

Estimating Error from Unlabeled Data
1. IF f1 , f2 , f3 make indep. errors, accuracies > 0.5 THEN
à
2. but if errors not independent

Estimating Error from Unlabeled Data
1. IF f1 , f2 , f3 make indep. errors, accuracies > 0.5 THEN
à
2. but if errors not independent

True error (red), estimated error (blue) [Platanios, Blum, Mitchell, UAI 2014]
NELL classifiers:

True error (red), estimated error (blue) [Platanios, Blum, Mitchell, UAI 2014]
NELL classifiers:
Brain image fMRI classifiers:

Summary
1. Use coupled training for semi-supervised learning
2. Datamine the KB to learn probabilistic inference rules
3. Automatically extend ontology
4. Use staged learning curriculum
New directions:
• Self-reflection,self-estimatesofaccuracy(A.Platanios) • IncorporatevisionwithNEIL(AbhinavGupta)
• Microreading(JayantKrishnamurthy,NdapaNakashole) • Aggressiveontologyexpansion(DerryWijaya)
• PortugueseNELL(EstevamHrushka)
• never-ending learning phones? robots? traffic lights?

thank you
and thanks to:
Darpa, Google, NSF, Yahoo!, Microsoft, Fulbright, Intel
follow NELL on Twitter: @CMUNELL browse/download NELL’s KB at http://rtw.ml.cmu.edu

Posted in Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *