Showing posts with label Mathematics. Show all posts
Showing posts with label Mathematics. Show all posts

Saturday, June 27, 2009

DNA Sudoku: Logic Of 'Sudoku' Math Puzzle Used To Vastly Enhance Genome-sequencing Capability

SOURCE

ScienceDaily (June 25, 2009) — A math-based game that has taken the world by storm with its ability to delight and puzzle may now be poised to revolutionize the fast-changing world of genome sequencing and the field of medical genetics, suggests a new report by a team of scientists at Cold Spring Harbor Laboratory (CSHL). The report will be published as the cover story in the July 1st issue of the journal Genome Research.
Combining a 2,000-year-old Chinese math theorem with concepts from cryptology, the CSHL scientists have devised "DNA Sudoku." The strategy allows tens of thousands of DNA samples to be combined, and their sequences – the order in which the letters of the DNA alphabet (A, T, G, and C) line up in the genome – to be determined all at once.
This achievement is in stark contrast to past approaches that allowed only a single DNA sample to be sequenced at a time. It also significantly improves upon current approaches that, at best, can combine hundreds of samples for sequencing.
"In theory, it is possible to use the Sudoku method to sequence more than a hundred thousand DNA samples," says CSHL Professor Gregory Hannon, Ph.D., a genomics expert and leader of the team that invented the "Sudoku" approach. At that level of efficiency, it promises to reduce costs dramatically. A sequencing project that costs upwards of $10 million using conventional methods may be accomplished for $50,000 to $80,000 using DNA Sudoku, he estimates.
Originally devised to overcome a sequencing limitation that dogged one of the Hannon lab's research projects, the new method has tremendous potential for clinical applications. It can be used, says Hannon, to analyze specific regions of the genomes of a large population and identify individuals who carry mutations that cause genetic diseases – a process known as genotyping.
The CSHL team has already begun to explore this possibility via a collaboration with Dor Yeshorim, a New York-based organization that has collected DNA from thousands of members of orthodox Jewish communities. The organization's aim is to prevent genetic diseases such as Tay-Sachs or cystic fibrosis that occur frequently within specific ethnic populations. The team's new method will now allow the many thousands of DNA samples gathered by Dor Yeshorim to be processed and sequenced in a single time-saving and cost-effective experiment, which should identify individuals who carry disease-causing mutations.
The advantages of DNA Sudoku
The mixing together and simultaneous sequencing of a massive number of DNA samples is known as multiplexing. In previous multiplexing approaches, scientists first tagged each sample with a barcode – a short string of DNA letters known as oligonucleotides – before mixing it with other samples that also had unique tags. After the sample mix had been sequenced, scientists could use the barcode tags on the resulting sequences as identification markers and thus tell which sequence belonged to which sample.
"But this approach is very limiting," explains Yaniv Erlich, a graduate student in the Hannon laboratory and first author on the "DNA Sudoku" paper. "It's time-consuming and costly to have to design a unique barcode for each sample prior to sequencing, especially if the number of samples runs in the thousands."
In order to circumvent this limitation, Erlich and others in the Hannon lab came up with the idea of mixing the samples in specific patterns, thereby creating pools of samples. And instead of tagging the individual samples within each pool, the scientists tagged each pool as a whole with one barcode. "Since we know which pool contains which samples, we can link a sequence to an individual sample with high confidence," says Erlich.
The key to the team's innovation is the pooling strategy, which is based on the 2,000-year-old Chinese remainder theorem. "It minimizes the number of pools and the amount of sequencing," says Hannon of their method, which they dubbed "DNA Sudoku" because of its similarity to the logic and combinatorial number-placement rules used in the popular game.
The method, which the CSHL team has patented, is currently best suited for genotype analyses that require only short segments of an individual's genome to be sequenced to find out if the individual is carrying a certain variant of a gene or a rare mutation. But as sequencing technologies improve and researchers gain the ability to generate sequences for longer segments of the genome, Hannon envisions wider clinical applications for their method such as HLA typing, already an important diagnostic tool for autoimmune diseases, cancer, and for predicting the risk of organ transplantation.
Journal reference:
Erlich et al. DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Research, 2009; DOI: 10.1101/gr.092957.109
Adapted from materials provided by Cold Spring Harbor Laboratory, via EurekAlert!, a service of AAAS.

Sunday, May 10, 2009

New Pattern Found in Prime Numbers

SOURCE

(PhysOrg.com) -- Prime numbers have intrigued curious thinkers for centuries. On one hand, prime numbers seem to be randomly distributed among the natural numbers with no other law than that of chance. But on the other hand, the global distribution of primes reveals a remarkably smooth regularity. This combination of randomness and regularity has motivated researchers to search for patterns in the distribution of primes that may eventually shed light on their ultimate nature.
In a recent study, Bartolo Luque and Lucas Lacasa of the Universidad Politécnica de Madrid in Spain have discovered a new pattern in primes that has surprisingly gone unnoticed until now. They found that the distribution of the leading digit in the prime number sequence can be described by a generalization of Benford’s law. In addition, this same pattern also appears in another number sequence, that of the leading digits of nontrivial Riemann zeta zeros, which is known to be related to the distribution of primes. Besides providing insight into the nature of primes, the finding could also have applications in areas such as fraud detection and stock market analysis.
“Mathematicians have studied prime numbers for centuries,” Lacasa told PhysOrg.com. “New insights and concepts coming from nonlinear science, such as multiplicative processes, help us to look at prime numbers from a different perspective. According to this focus, it becomes significant that even today it is still possible to discover unnoticed hints of statistical regularity in such sequences, without being an expert in number theory. However, the most significant issue in this work is not to unveil this pattern in primes and Riemann zeros, but to understand the reason and implications of such unexpected structure, not just for number theoretical issues but, interestingly, for other disciplines as well. For instance, these results deepen our understanding of correlations in systems composed of many elements.”
Benford’s law (BL), named after physicist Frank Benford in 1938, describes the distribution of the leading digits of the numbers in a wide variety of data sets and mathematical sequences. Somewhat unexpectedly, the leading digits aren’t randomly or uniformly distributed, but instead their distribution is logarithmic. That is, 1 as a first digit appears about 30% of the time, and the following digits appear with lower and lower frequency, with 9 appearing the least often. Benford’s law has been shown to describe disparate data sets, from physical constants to the length of the world’s rivers.
Since the late ‘70s, researchers have known that prime numbers themselves, when taken in very large data sets, are not distributed according to Benford’s law. Instead, the first digit distribution of primes seems to be approximately uniform. However, as Luque and Lacasa point out, smaller data sets (intervals) of primes exhibit a clear bias in first digit distribution. The researchers noticed another pattern: the larger the data set of primes they analyzed, the more closely the first digit distribution approached uniformity. In light of this, the researchers wondered if there existed any pattern underlying the trend toward uniformity as the prime interval increases to infinity.
The set of all primes - like the set of all integers - is infinite. From a statistical point of view, one difficulty in this kind of analysis is deciding how to choose at “random” in an infinite data set. So a finite interval must be chosen, even if it is not possible to do so completely randomly in a way that satisfies the laws of probability. To overcome this point, the researchers decided to chose several intervals of the shape [1, 10d]; for example, 1-100,000 for d = 5, etc. In these sets, all first digits are equally probable a priori. So if a pattern emerges in the first digit of primes in a set, it would reveal something about first digit distribution of primes, if only within that set.
By looking at multiple sets as d increases, Luque and Lacasa could investigate how the first digit distribution of primes changes as the data set increases. They found that primes follow a size-dependent Generalized Benford’s law (GBL). A GBL describes the first digit distribution of numbers in series that are generated by power law distributions, such as [1, 10d]. As d increases, the first digit distribution of primes becomes more uniform, following a trend described by GBL. As Lacasa explained, both BL and GBL apply to many processes in nature.
“Imagine that you have $1,000 in your bank account, with an interest rate of 1% per month,” Lacasa said. “The first month, your money will become $1,000*1.01 = $1,010. The next month, $1,010*1.01, and so on. After n months, you will have $1,000*(1.01)^n. Notice that you will need many months to go from $1,000 to $2,000, while to go from $8,000 to $9,000 will be much easier. When you analyze your accounting data, you will realize that the first digit 1 is more represented than 8 or 9, precisely as Benford's law dictates. This is a very basic example of a multiplicative process where 0.01 is the multiplicative constant.
“Physicists have shown that many processes in nature can be modeled as stochastic multiplicative processes, where the previously constant value of 0.01 is now a random variable and the data equivalent to the money of our latter example is another random variable with an underlying distribution 1/x. Stochastic processes with such distributions are shown to follow BL. Now, many other phenomena fit better to a stochastic process with a more general underlying probability x^[-alpha], where alpha is different from one. The first digit distribution related with this general power law distribution is the so-called Generalized Benford law (which converges to BL for alpha = 1).”
Significantly, Luque and Lacasa showed in their study that GBL can be explained by the prime number theorem; specifically, the shape of the mean local density of the sequences is responsible for the pattern. The researchers also developed a mathematical framework that provides conditions for any distribution to conform to a GBL. The conditions build on previous research, which has shown that Benford behavior could occur when a distribution follows BL for particular values of its parameters, as in the case of primes. Luque and Lacasa also investigated the sequence of nontrivial Riemann zeta zeros, which are related to the distribution of primes, and whose distribution of the zeros is considered to be one of the most important unsolved mathematical problems. Although the distribution of the zeros does not follow BL, here the researchers found that it does follow a size-dependent GBL, as in the case of the primes.
The researchers suggest that this work could have several applications, such as identifying other sequences that aren’t Benford distributed, but may be GBL. In addition, many applications that have been developed for Benford’s law could eventually be generalized to the wider context of the Generalized Benford’s law. One such application is fraud detection: while naturally generated data obey Benford’s law, randomly guessed (fraudulent) data do not, in general.
“BL is a specific case of GBL,” Lacasa explained. “Many processes in nature can be fitted to a GBL with alpha = 1, i.e. a BL. The hidden structure that Benford's law quantifies is lost when numbers are artificially modified: this is a principle for fraud detection in accounting, where the combinatorial mechanisms associated to accounting sets are such that BL applies. The same principle holds for processes following GBL with a generic alpha, where BL fails. Last, for processes whose underlying density is not x^(-alpha) but 1/logN, a size-dependent GBL would be the correct hallmark.”
More information: Bartolo Luque and Lucas Lacasa. “The first digit frequencies of primes and Riemann zeta zeros.” Proceedings of the Royal Society A. doi: 10.1098/rspa.2009.0126.

Monday, October 8, 2007

Physicists Tackle Knotty Puzzle


Source:

Science Daily — Electrical cables, garden hoses and strands of holiday lights seem to get themselves hopelessly tangled with no help at all. Now research initiated by an undergraduate student at the University of California, San Diego has resulted in the first model of how knots form.
The study investigated the likelihood of knot formation and the types of knots formed in a tumbled string. The researchers say they were interested in the problem because it has many applications, including to the biophysics research questions their group usually studies.
“Knot formation is important in many fields,” said Douglas Smith, an assistant professor of physics who was the senior author on the paper. “For example, knots often form in DNA, which is a long string-like molecule. Cells have enzymes that undo the knots by cutting the DNA strands so that they can pass through each other. Certain anti-cancer drugs stop tumor cells from dividing by blocking the unknotting of DNA.”
Dorian Raymer, a research assistant working with Smith, initiated the study because he was interested in knot theory—the branch of mathematics that uses formulae to distinguish unique knots. Raymer was an undergraduate major in physics when he did the work. Smith said his own interest was piqued when he discovered that no one really knew how knots formed.
“Very little experimental work had been done to apply knot theory to the analysis and classification of real, physical knots,” said Smith. “For mathematicians, the problem is very abstract. They imagine the types of knots that can form and then classify them. In our experiments, we produced thousands of different knots, used mathematical knot theory to analyze them, and then developed a simple physics model to explain our findings.”
The experimental set up consisted of a plastic box that was spun by a computer-controlled motor. A piece of string was dropped into the box and tumbled around like clothes in a dryer. Knots formed very quickly, within 10 seconds. The researchers repeated the experiment more than 3,000 times varying the length and stiffness of string, box size and speed of rotation. They classified the resulting knots.
“It is virtually impossible to distinguish different knots just by looking at them,” said Raymer. “So I developed a computer program to do it. The computer program counts each crossing of the string. It notes whether the crossing is under or over, and whether the string follows a path to the left or to the right. The result is a bunch of numbers that can be translated into a mathematical fingerprint for a knot.
“We used the Jones polynomial—a famous math formula developed by Vaughn Jones, a mathematics professor at U.C. Berkeley—because it automatically simplifies mirror images and other knots that are identical, but look different.”
Rather than getting just a few types of knots, Smith and Raymer got all the types that mathematicians had enumerated, at least up to a certain complexity level. The longer the string, the greater was the probability of getting complex knots.
Based on these observations, the researchers proposed a simplified model for knot formation. The string forms concentric coils, like a looped garden hose, due to its stiffness and the confinement of the box. The free end of the string weaves through the coils, with a 50 percent probability of going under or over any coil. A computer simulation based on this model produced a similar pattern of simple and complex knots as observed in their experiments.
Smith and Raymer said that the model can also explain why confining a stiff string in a smaller box decreases the probability of knot formation. Increased confinement reduces the tumbling motion that facilitates the weaving of the string end through the coils. The paper cites other researchers who have proposed a similar effect to explain why knotting of the umbilical cord of fetuses is relatively rare, occurring only about one percent of the time. Confinement to the amniotic sac may restrict the probability of knotting.
Smith said that their results do not point to any magic solution to prevent knots from forming, but the project did inspire some advice for young people interested in science.
“Even today, there are still interesting scientific problems that can be studied in your garage with inexpensive, off-the-shelf materials like the ones we used in our experiments,” he said. “The most important thing is to be curious and ask good questions.”
This research was published in the journal Proceedings of the National Academy of Sciences.
Note: This story has been adapted from material provided by University of California, San Diego.

Fausto Intilla

Friday, August 31, 2007

Math Model For Circadian Rhythm Created


Source:

Science Daily — The internal clock in living beings that regulates sleeping and waking patterns -- usually called the circadian clock -- has often befuddled scientists due to its mysterious time delays. Molecular interactions that regulate the circadian clock happen within milliseconds, yet the body clock resets about every 24 hours. What, then, stretches the expression of the clock over such a relatively long period?
Cornell researchers have contributed to the answer, thanks to new mathematical models recently published.
In the August online edition of Public Library of Science (PLOS) Computational Biology, Cornell biomolecular engineer Kelvin Lee, in collaboration with graduate student Robert S. Kuczenski, Kevin C. Hong '05 and Jordi Garcia-Ojalvo of Universitat Politecnica de Catalunya, Spain, hypothesize that the accepted model of circadian rhythmicity may be missing a key link, based on a mathematical model of what happens during the sleeping/waking cycle in fruit flies.
"We didn't discover any new proteins or genes," Lee said. "We took all the existing knowledge, and we tried to organize it."
Using mathematical models initially created by Hong, who has since graduated, the team set out to map the molecular interactions of proteins called period and timeless -- widely known to be related to the circadian clock.
The group hypothesized that an extra, unknown protein would need to be inserted into the cycle with period and timeless, a molecule that Kuczenski named the focus-binding mediator, in order for the cycle to stretch to 24 hours.
Lee said many scientists are interested in studying the circadian clock, and not just to understand such concepts as jet lag -- fatigue induced by traveling across time zones. Understanding the body's biological cycle might, for example, lead to better timing of delivering chemotherapy, when the body would be most receptive, Lee said.
Note: This story has been adapted from a news release issued by Cornell University.

Fausto Intilla