Tagging operators

In simuPOP, tagging refers to the action of setting various information fields of offspring, usually using various parental information during the production of offspring. simuPOP provides a number of tagging operators (called taggers) for various purposes. Because tagging operators are during-mating operators, parameter subPops can be used to tag only offspring that belong to specified virtual subpopulation. (e.g. all male offspring)

Inheritance tagger (operator InheritTagger)

An inheritance tagger passes values of parental information field(s) to the corresponding offspring information field(s). Depending on the parameters, an InheritTagger can

  • For asexual mating schemes, pass one or more information fields from parent to offspring.
  • Pass one or more information fields from father to offspring (mode=PATERNAL).
  • Pass one or more information fields from mother to offspring (mode=MATERNAL).
  • Pass the maximal, minimal, sum, multiplcation or average of values of one or more information fields of both parents (mode=MAXIMUM, MINIMUM, ADDITION, MULTIPLICATION or MEAN).

This can be used to track the spread of certain information during evolution. For example, ExampleInheritTagger tags the first individuals of ten subpopulations of size 1000. individuals in the offspring generation inherits the maximum value of field x from his/her parents so x is inherited regardless of the sex of parents. A Stat operator is used to calculate the number of offspring having this tag in each subpopulation. The results show that some tagged ancestors have many offspring, and some have none. If you run this simulation long enough, you can see that all ancestors become the ancestor of either none or all indiviudals in a population. Note that this simulation only considers genealogical inheritance and ancestors do not have to pass any genotype to the last generation.

Example: Use an inherit tagger to track offspring of individuals

>>> import simuPOP as sim
>>> pop = sim.Population(size=[1000]*10, loci=1, infoFields='x')
>>> # tag the first individual of each subpopulation.
>>> for sp in range(pop.numSubPop()):
...     pop.individual(0, sp).x = 1
...
>>> pop.evolve(
...     initOps=sim.InitSex(),
...     matingScheme=sim.RandomMating(ops=[
...         sim.MendelianGenoTransmitter(),
...         sim.InheritTagger(mode=sim.MAXIMUM, infoFields='x'),
...     ]),
...     postOps=[
...         sim.Stat(sumOfInfo='x', vars=['sumOfInfo_sp']),
...         sim.PyEval(r'", ".join(["%3d" % subPop[i]["sumOfInfo"]["x"] for i in range(10)])+"\n"'),
...     ],
...     gen = 5
... )
  2,   1,   0,   1,   1,   2,   3,   3,   1,   1
  5,   1,   0,   1,   1,   3,   3,   5,   3,   0
  9,   2,   0,   2,   2,   7,   9,   5,  13,   0
 21,   4,   0,   2,   5,  18,  11,   9,  27,   0
 39,   5,   0,   6,   8,  36,  23,  20,  67,   0
5

now exiting runScriptInteractively...

Download InheritTagger.py

Summarize parental informatin fields (operator SummaryTagger)

A SummaryTagger summarize values of one or more parental information fields and place the result in an offspring information field. If mating is sexual, two sets of values will be involved. Summarization methods include MEAN, MINIMUM, MAXIMUM, SUMMATION and MULTIPLICATION. The operator is usually used to summarize certain characteristic of parents of each offspring. For example, a SummaryTagger is used in Example SummaryTagger to calculate the mean fitness of parents during each mating event. The results are saved in the avgFitness field of offspring. Because allele 1 at locus 0 is under purifying selection, the allele frequency of this allele decreases. In the mean time, fitness of parents increases because less and less parents have this allele.

Example: Using a summary tagger to calculate mean fitness of parents.

>>> import simuPOP as sim
>>> pop = sim.Population(1000, loci=1, infoFields=['fitness', 'avgFitness'])
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.InitGenotype(freq=[0.5, 0.5]),
...     ],
...     preOps=sim.MaSelector(loci=0, wildtype=0, fitness=[1, 0.99, 0.95]),
...     matingScheme=sim.RandomMating(ops=[
...         sim.MendelianGenoTransmitter(),
...         sim.SummaryTagger(mode=sim.MEAN, infoFields=['fitness', 'avgFitness']),
...     ]),
...     postOps=[
...         sim.Stat(alleleFreq=0, meanOfInfo='avgFitness', step=10),
...         sim.PyEval(r"'gen %d: allele freq: %.3f, average fitness of parents: %.3f\n' % "
...             "(gen, alleleFreq[0][1], meanOfInfo['avgFitness'])", step=10)
...     ],
...     gen = 50,
... )
gen 0: allele freq: 0.473, average fitness of parents: 0.984
gen 10: allele freq: 0.421, average fitness of parents: 0.986
gen 20: allele freq: 0.388, average fitness of parents: 0.988
gen 30: allele freq: 0.288, average fitness of parents: 0.991
gen 40: allele freq: 0.256, average fitness of parents: 0.993
50

now exiting runScriptInteractively...

Download SummaryTagger.py

Tracking parents (operator ParentsTagger)

A parents tagger is used to record the indexes of parents (in the parental population) in the information fields (default to father_idx, mother_idx) of their offspring. These indexes provide a way to track down an individuals parents, offspring and consequently all relatives in a multi- generation population. Because this operator has been extensively used in this guide, please refer to other sections for an Example (e.g. Example basicInfoFields).

As long as parental generations do not change after the offspring generation is created, recorded parental indexes can be used to locate parents of an individual. However, in certain applications when parental generations change (e.g. to draw a pedigree from a large population), or when individuals can not be looked up easily using indexes (e.g. after individuals are saved to a file), giving every Individual an unique ID and refer to them using ID will be a better choice.

Tracking index of offspring within families (operator OffspringTagger)

An offspring tagger is used to record the index of offspring within each family in an information field (default to offspring_idx) of offspring. Because the index is reset for each mating event, the index will be reset even if two adjacent families share the same parents. In addition, this operator records the relative index of an offspring so the index will not change if an offspring is re-generated when the previous offspring is discarded for any reason.

Because during-mating selection operator discards offspring according their genotypes, a mating scheme can produce families with varying sizes even if numOffspring is set to a constant number. On the other hand, if we would like to ensure equal family size N in the presence of natural selection, we will have to produce more offspring so that there can be at least N offspring in each family after selection. Once N offspring have been generated, excessive offspring can be discarded according to offspring_idx. The following example demonstrates such a simulation scenario:

Example: Keeping constant family size in the presence of natural selection against offspring

>>> import simuPOP as sim
>>> pop = sim.Population(1000, loci=1, infoFields='offspring_idx')
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.InitGenotype(freq=[0.5, 0.5]),
...     ],
...     matingScheme=sim.RandomMating(ops=[
...         sim.MendelianGenoTransmitter(),
...         # lethal recessive alleles
...         sim.MaSelector(loci=0, wildtype=0, fitness=[1, 0.90, 0.5]),
...         sim.OffspringTagger(),
...         sim.DiscardIf('offspring_idx > 4'),
...     ], numOffspring=10),
...     postOps=[
...         sim.Stat(alleleFreq=0, step=10),
...         sim.PyEval(r"'gen %d: allele freq: %.3f\n' % "
...             "(gen, alleleFreq[0][1])", step=10)
...     ],
...     gen = 50,
... )
gen 0: allele freq: 0.445
gen 10: allele freq: 0.187
gen 20: allele freq: 0.089
gen 30: allele freq: 0.087
gen 40: allele freq: 0.059
50

now exiting runScriptInteractively...

Download OffspringTagger.py

Because families with lethal alleles produce the same number of offspring as families without such alleles, natural selection happens within each families and is weaker than the case when natural selection is used to all offspring. This phenomena is generally referred to as reproductive compensation.

Assign unique IDs to individuals (operator IdTagger)

Although it is possible to use generation number and individual indexes to locate individuals in an evolving population, an unique I D makes it much easier to identify individuals when migration is involved, and to analyze an evolutionary process outside of simuPOP. An operator IdTagger (and its function form tagID) is provided by simuPOP to assign an unique ID to all individuals during evolution.

The IDs of individuals are usually stored in an information field named ind_id. To ensure uniqueness across populations, a single source of ID is used for this operator. individual IDs are assigned consecutively starting from 0. If you would like to reset the sequence or start from a different number, you can call the reset(startID) function of any IdTagger.

An IdTagger is usually used during-mating to assign ID to each offspring. However, if it is applied directly to a population, it will assign unique IDs to all individuals in this population. This property is usually used in the preOps parameter of function Simulator.evolve to assign initial ID to a population. For example, two IdTagger operators are used in Example IdTagger to assign IDs to all individuals. Although different operators are used, different IDs are assigned to individuals.

Example: Assign unique IDs to individuals

>>> import simuPOP as sim
>>> pop = sim.Population(10, infoFields='ind_id', ancGen=1)
>>> pop.evolve(
...     initOps=sim.IdTagger(),
...     matingScheme=sim.RandomSelection(ops=[
...         sim.CloneGenoTransmitter(),
...         sim.IdTagger(),
...     ]),
...     gen = 1
... )
1
>>> print([int(ind.ind_id) for ind in pop.individuals()])
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
>>> pop.useAncestralGen(1)
>>> print([int(ind.ind_id) for ind in pop.individuals()])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> sim.tagID(pop) # re-assign ID
>>> print([int(ind.ind_id) for ind in pop.individuals()])
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30]

now exiting runScriptInteractively...

Download IdTagger.py

Tracking Pedigrees (operator PedigreeTagger)

A PedigreeTagger is similar to a ParentsTagger in that it records parental information in offspring’s information fields. However, instead of indexes of parents, this operator records an unique ID of each parent to make it easier to study and reconstruct a complete pedigree of a whole evolutionary process. The default information fields are father_id and mother_id.

By default, the PedigreeTagger does not produce any output. However, if a valid output string (or function) is specified, it will output the ID of offspring and their parents, sex and affection status of offspring, and optionally values at specified information fields (parameter outputFields) and genotype at specified loci (parameter outputLoci). Because this operator only outputs offspring, the saved file does not have detailed information of individuals in the top-most ancestral generation. If you would like to record complete pedigree information, you can apply PedigreeTagger in the initOps operator of function Simulator.evolve or Population.evolve to output information of the initial population. Although this operator is primarily used to output pedigree information, values at specified information fields and genotypes at specified loci could also be outputed.

Example PedigreeTagger demonstrates how to output the complete pedigree of an evolutionary process. Note that IdTagger has to be applied before PedigreeTagger so that IDs of offspring could be assigned before they are outputted.

Example: Output a complete pedigree of an evolutionary process

>>> import simuPOP as sim
>>> pop = sim.Population(100, infoFields=['ind_id', 'father_id', 'mother_id'])
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.IdTagger(),
...         sim.PedigreeTagger(output='>>pedigree.txt'),
...     ],
...     matingScheme=sim.RandomMating(ops=[
...         sim.IdTagger(),
...         sim.PedigreeTagger(output='>>pedigree.txt'),
...         sim.MendelianGenoTransmitter()]
...     ),
...     gen = 100
... )
100
>>> ped = open('pedigree.txt')
>>> lines = ped.readlines()
>>> ped.close()
>>> # first few lines, saved by the first PedigreeTagger
>>> print(''.join(lines[:3]))
1 0 0 F U
2 0 0 F U
3 0 0 M U

>>> # last several lines, saved by the second PedigreeTagger
>>> print(''.join(lines[-3:]))
10098 9974 9915 F U
10099 9967 9997 M U
10100 9945 9936 M U

>>> # load this file
>>> ped = sim.loadPedigree('pedigree.txt')
>>> # should have 100 ancestral generations (plus one present generation)
>>> ped.ancestralGens()
100

now exiting runScriptInteractively...

Download PedigreeTagger.py

A hybrid tagger (operator PyTagger)

A PyTagger uses a user-defined function to pass parental information fields to offspring. When a mating event happens, this operator collect values of specified information fields of parents, pass them to a user-provided function, and use the return values to set corresponding offspring information fields. A typical usage of this operator is to set random environmental factors that are affected by parental values. Example PyTagger demonstrates such an example where the location of each offspring (x, y) is randomly assigned around the middle position of his or her parents.

Example: Use of a hybrid tagger to pass parental information to offspring

>>> import simuPOP as sim
>>> import random
>>> def randomMove(x, y):
...     '''Pass parental information fields to offspring'''
...     # shift right with high concentration of alleles...
...     off_x = random.normalvariate((x[0]+x[1])/2., 0.1)
...     off_y = random.normalvariate((y[0]+y[1])/2., 0.1)
...     return off_x, off_y
...
>>> pop = sim.Population(1000, loci=[1], infoFields=['x', 'y'])
>>> pop.setVirtualSplitter(sim.GenotypeSplitter(loci=0, alleles=[[0, 0], [0,1], [1, 1]]))
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.InitGenotype(freq=[0.5, 0.5]),
...         sim.InitInfo(random.random, infoFields=['x', 'y'])
...     ],
...     matingScheme=sim.RandomMating(ops=[
...         sim.MendelianGenoTransmitter(),
...         sim.PyTagger(func=randomMove),
...     ]),
...     postOps=[
...         sim.Stat(minOfInfo='x', maxOfInfo='x'),
...         sim.PyEval(r"'Range of x: %.2f, %.2f\n' % (minOfInfo['x'], maxOfInfo['x'])")
...     ],
...     gen = 5
... )
Range of x: -0.17, 1.12
Range of x: -0.05, 1.14
Range of x: 0.01, 1.01
Range of x: 0.01, 1.04
Range of x: 0.06, 0.95
5
>>>

now exiting runScriptInteractively...

Download PyTagger.py

Tagging that involves other parental information

If the way how parental information fields pass to their offspring is affected by parental genotype, sex, or affection status, you could use a Python operator (PyOperator) during mating to explicitly obtain parental information and set offspring information fields.

Alternatively, you can add another information field, translate needed information to this field and pass the genotype information in the form of information field. Operator InfoExec could be helpful in this case. Example otherTagging demonstrates such an example where the number of affected parents are recorded in an information field. Before mating happens, a penetrance operator is used to assign affection status to parents. The affection status is then copied to an information field affected so that operator SummaryTagger could be used to count the number of affected parents. Two MaPenetrance operators are used both before and after mating to assign affection status to both parental and offspring generations. This helps dividing the offspring generation into affected and unaffected virtual subpopulations. Not surprisingly, the average number of affected parents is larger for affected individuals than unaffected individuals.

Example: Tagging that involves other parental information

>>> import simuPOP as sim
>>> pop = sim.Population(1000, loci=[1], infoFields=['aff', 'numOfAff'])
>>> # define virtual subpopulations by affection sim.status
>>> pop.setVirtualSplitter(sim.AffectionSplitter())
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.InitGenotype(freq=[0.5, 0.5]),
...     ],
...     preOps=[
...         # get affection sim.status for parents
...         sim.MaPenetrance(loci=0, wildtype=0, penetrance=[0.1, 0.2, 0.4]),
...         # set 'aff' of parents
...         sim.InfoExec('aff = ind.affected()', exposeInd='ind'),
...     ],
...         # get number of affected parents for each offspring and store in numOfAff
...     matingScheme=sim.RandomMating(ops=[
...         sim.MendelianGenoTransmitter(),
...         sim.SummaryTagger(mode=sim.SUMMATION, infoFields=['aff', 'numOfAff'])]),
...     postOps=[
...         # get affection sim.status for offspring
...         sim.MaPenetrance(loci=0, wildtype=0, penetrance=[0.1, 0.2, 0.4]),
...         # calculate mean 'numOfAff' of offspring, for unaffected and affected subpopulations.
...         sim.Stat(meanOfInfo='numOfAff', subPops=[(0,0), (0,1)], vars=['meanOfInfo_sp']),
...         # print mean number of affected parents for unaffected and affected offspring.
...         sim.PyEval(r"'Mean number of affected parents: %.2f (unaff), %.2f (aff)\n' % "
...             "(subPop[(0,0)]['meanOfInfo']['numOfAff'], subPop[(0,1)]['meanOfInfo']['numOfAff'])")
...     ],
...     gen = 5
... )
Mean number of affected parents: 0.41 (unaff), 0.44 (aff)
Mean number of affected parents: 0.41 (unaff), 0.54 (aff)
Mean number of affected parents: 0.47 (unaff), 0.55 (aff)
Mean number of affected parents: 0.47 (unaff), 0.55 (aff)
Mean number of affected parents: 0.42 (unaff), 0.45 (aff)
5
>>>

now exiting runScriptInteractively...

Download otherTagging.py