Merging two files in unix, with one common column that is redundant The Next CEO of Stack...
Can Sri Krishna be called 'a person'?
Calculate the Mean mean of two numbers
How to show a landlord what we have in savings?
Small nick on power cord from an electric alarm clock, and copper wiring exposed but intact
Planeswalker Ability and Death Timing
Arrows in tikz Markov chain diagram overlap
Is it "common practice in Fourier transform spectroscopy to multiply the measured interferogram by an apodizing function"? If so, why?
Is it OK to decorate a log book cover?
What does this strange code stamp on my passport mean?
Calculating discount not working
What steps are necessary to read a Modern SSD in Medieval Europe?
"Eavesdropping" vs "Listen in on"
Mathematica command that allows it to read my intentions
Is there a rule of thumb for determining the amount one should accept for a settlement offer?
Why did early computer designers eschew integers?
How dangerous is XSS
Is the offspring between a demon and a celestial possible? If so what is it called and is it in a book somewhere?
How to coordinate airplane tickets?
Is it okay to majorly distort historical facts while writing a fiction story?
What did the word "leisure" mean in late 18th Century usage?
pgfplots: How to draw a tangent graph below two others?
Is a linearly independent set whose span is dense a Schauder basis?
Finitely generated matrix groups whose eigenvalues are all algebraic
How to compactly explain secondary and tertiary characters without resorting to stereotypes?
Merging two files in unix, with one common column that is redundant
The Next CEO of Stack OverflowJoining text files with 600M+ linesGetting complement of two text files on the Unix command lineSearch for files with more than one term (grep, awk?)Separate one data file into two filesHow to find the matching records from 2 files in unixRemove last column values in field seperated file using unix shell scriptA way to split a text file into arbitrary blocks based on first column?How copy or display unique entries from two text files in linux or in windowsjoin files with variable column and line numbers and based on 1 fieldMerging Multiple files based on the common column
I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.
File 1:
chr1:66997824-67000456 ZNF333
chr1:66997824-67000456 EGR1
chr1:66997824-67000456 MZF-1
chr22:51221989-51222166 Zic2
chr22:51221989-51222166 ZF5
File 2:
chr1:66997824-67000456 Refseq#1
chr22:51221989-51222166 Refseq#22
I would like to merge these two files, and create a new file with three columns,
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?
unix sed awk
add a comment |
I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.
File 1:
chr1:66997824-67000456 ZNF333
chr1:66997824-67000456 EGR1
chr1:66997824-67000456 MZF-1
chr22:51221989-51222166 Zic2
chr22:51221989-51222166 ZF5
File 2:
chr1:66997824-67000456 Refseq#1
chr22:51221989-51222166 Refseq#22
I would like to merge these two files, and create a new file with three columns,
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?
unix sed awk
Please take a look at editing-help.
– Cyrus
Oct 31 '15 at 16:08
Try join command.
– 2991ambusher
Oct 31 '15 at 16:17
add a comment |
I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.
File 1:
chr1:66997824-67000456 ZNF333
chr1:66997824-67000456 EGR1
chr1:66997824-67000456 MZF-1
chr22:51221989-51222166 Zic2
chr22:51221989-51222166 ZF5
File 2:
chr1:66997824-67000456 Refseq#1
chr22:51221989-51222166 Refseq#22
I would like to merge these two files, and create a new file with three columns,
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?
unix sed awk
I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.
File 1:
chr1:66997824-67000456 ZNF333
chr1:66997824-67000456 EGR1
chr1:66997824-67000456 MZF-1
chr22:51221989-51222166 Zic2
chr22:51221989-51222166 ZF5
File 2:
chr1:66997824-67000456 Refseq#1
chr22:51221989-51222166 Refseq#22
I would like to merge these two files, and create a new file with three columns,
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?
unix sed awk
unix sed awk
asked Oct 31 '15 at 14:41
ABBABB
11
11
Please take a look at editing-help.
– Cyrus
Oct 31 '15 at 16:08
Try join command.
– 2991ambusher
Oct 31 '15 at 16:17
add a comment |
Please take a look at editing-help.
– Cyrus
Oct 31 '15 at 16:08
Try join command.
– 2991ambusher
Oct 31 '15 at 16:17
Please take a look at editing-help.
– Cyrus
Oct 31 '15 at 16:08
Please take a look at editing-help.
– Cyrus
Oct 31 '15 at 16:08
Try join command.
– 2991ambusher
Oct 31 '15 at 16:17
Try join command.
– 2991ambusher
Oct 31 '15 at 16:17
add a comment |
1 Answer
1
active
oldest
votes
join file1 file2
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
or
join file1 file2 | awk '{OFS=" ";print $1,$2,$3}'
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.
– ABB
Nov 2 '15 at 15:27
@ABB: It should work though, only IF THE FILES ARE SORTED though. Usesort
if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g usinguniq -w 24
on file 1 and try to track your bug.
– Joce
Nov 3 '15 at 15:49
@ABB: Try this:join <(sort file1) <(sort file2)
– Cyrus
Nov 5 '15 at 6:38
add a comment |
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f994318%2fmerging-two-files-in-unix-with-one-common-column-that-is-redundant%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
join file1 file2
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
or
join file1 file2 | awk '{OFS=" ";print $1,$2,$3}'
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.
– ABB
Nov 2 '15 at 15:27
@ABB: It should work though, only IF THE FILES ARE SORTED though. Usesort
if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g usinguniq -w 24
on file 1 and try to track your bug.
– Joce
Nov 3 '15 at 15:49
@ABB: Try this:join <(sort file1) <(sort file2)
– Cyrus
Nov 5 '15 at 6:38
add a comment |
join file1 file2
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
or
join file1 file2 | awk '{OFS=" ";print $1,$2,$3}'
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.
– ABB
Nov 2 '15 at 15:27
@ABB: It should work though, only IF THE FILES ARE SORTED though. Usesort
if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g usinguniq -w 24
on file 1 and try to track your bug.
– Joce
Nov 3 '15 at 15:49
@ABB: Try this:join <(sort file1) <(sort file2)
– Cyrus
Nov 5 '15 at 6:38
add a comment |
join file1 file2
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
or
join file1 file2 | awk '{OFS=" ";print $1,$2,$3}'
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
join file1 file2
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
or
join file1 file2 | awk '{OFS=" ";print $1,$2,$3}'
Output:
chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22
edited Oct 31 '15 at 16:56
answered Oct 31 '15 at 16:11
CyrusCyrus
3,88611125
3,88611125
It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.
– ABB
Nov 2 '15 at 15:27
@ABB: It should work though, only IF THE FILES ARE SORTED though. Usesort
if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g usinguniq -w 24
on file 1 and try to track your bug.
– Joce
Nov 3 '15 at 15:49
@ABB: Try this:join <(sort file1) <(sort file2)
– Cyrus
Nov 5 '15 at 6:38
add a comment |
It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.
– ABB
Nov 2 '15 at 15:27
@ABB: It should work though, only IF THE FILES ARE SORTED though. Usesort
if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g usinguniq -w 24
on file 1 and try to track your bug.
– Joce
Nov 3 '15 at 15:49
@ABB: Try this:join <(sort file1) <(sort file2)
– Cyrus
Nov 5 '15 at 6:38
It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.
– ABB
Nov 2 '15 at 15:27
It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.
– ABB
Nov 2 '15 at 15:27
@ABB: It should work though, only IF THE FILES ARE SORTED though. Use
sort
if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24
on file 1 and try to track your bug.– Joce
Nov 3 '15 at 15:49
@ABB: It should work though, only IF THE FILES ARE SORTED though. Use
sort
if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24
on file 1 and try to track your bug.– Joce
Nov 3 '15 at 15:49
@ABB: Try this:
join <(sort file1) <(sort file2)
– Cyrus
Nov 5 '15 at 6:38
@ABB: Try this:
join <(sort file1) <(sort file2)
– Cyrus
Nov 5 '15 at 6:38
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f994318%2fmerging-two-files-in-unix-with-one-common-column-that-is-redundant%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please take a look at editing-help.
– Cyrus
Oct 31 '15 at 16:08
Try join command.
– 2991ambusher
Oct 31 '15 at 16:17